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Abstract 

We  discuss  statistical  and  computational  aspects  of  inverse  or  parameter  estima¬ 
tion  problems  based  on  Ordinary  Least  Squares  and  Generalized  Least  Squares  with 
appropriate  corresponding  data  noise  assumptions  of  constant  variance  and  noncon¬ 
stant  variance  (relative  error),  respectively.  Among  the  topics  included  here  are  math¬ 
ematical  model,  statistical  model  and  data  assumptions,  and  some  techniques  (residual 
plots,  sensitivity  analysis,  model  comparison  tests)  for  verifying  these.  The  ideas  are 
illustrated  throughout  with  the  popular  logistic  growth  model  of  Verhulst  and  Pearl 
as  well  as  with  a  recently  developed  population  level  model  of  pneumococcal  disease 
spread. 


Keywords:  Inference,  least  squares  inverse  problems,  parameter  estimation,  sensi¬ 
tivity  and  generalized  sensitivity  functions. 


1  Introduction 

In  this  Chapter  we  discuss  mathematical  and  statistical  aspects  of  inverse  or  parameter 
estimation  problems.  While  we  briefly  discuss  maximum  likelihood  estimators  (MLE),  our 
focus  here  will  be  on  ordinary  least  squares  (OLS)  and  generalized  least  squares  (GLS)  estima¬ 
tion  formulations  and  issues  related  to  use  of  these  techniques  in  practice.  While  we  choose  a 
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general  nonlinear  ordinary  differential  equation  mathematical  model  to  discuss  concepts  and 
ideas,  the  discussions  are  also  applicable  to  partial  differential  equation  models  and  other 
deterministic  dynamical  systems.  As  we  shall  explain,  the  choice  of  an  appropriate  statisti¬ 
cal  model  is  of  critical  importance,  and  we  discuss  at  length  the  difference  between  constant 
variance  and  nonconstant  variance  noise  in  the  observation  process,  the  consequences  for 
incorrect  choices  in  this  regard,  and  computational  techniques  for  investigating  whether  a 
good  decision  has  been  made.  In  particular,  we  illustrate  use  of  residual  plots  to  suggest 
whether  or  not  a  correct  statistical  model  has  been  specified  in  an  inverse  problem  formu¬ 
lation.  We  illustrate  these  and  other  techniques  with  examples  including  the  well  known 
Verhulst-Pearl  logistic  population  model  and  a  specific  epidemiological  model  (a  pneumo¬ 
coccal  disease  dynamics  model).  We  discuss  the  use  of  sensitivity  equations  coupled  with  the 
asymptotic  theory  for  sampling  distributions  and  the  computation  of  associated  covariances, 
standard  errors  and  confidence  intervals  for  the  estimators  of  model  parameters.  We  also 
discuss  sensitivity  functions  ( traditional  and  generalized )  and  their  emerging  use  in  design  of 
experiments  for  data  specific  to  models  and  mechanism  investigation.  Traditional  sensitivity 
involves  sensitivity  of  outputs  to  parameters  while  the  recent  concept  of  generalized  sensi¬ 
tivity  in  inverse  problems  pertains  to  sensitivity  of  parameters  (to  be  estimated)  to  data  or 
observations.  That  is,  generalized  sensitivity  quantifies  the  relevance  of  data  measurements 
for  identification  of  parameters  in  a  typical  parameter  estimation  problem.  In  a  final  section 
we  present  and  illustrate  some  methods  for  model  comparison. 

2  Parameter  Estimation:  MLE,  OLS,  and  GLS 

2.1  The  Underlying  Mathematical  and  Statistical  Models 

We  consider  inverse  or  parameter  estimation  problems  in  the  context  of  a  parameterized 
(with  vector  parameter  6)  dynamical  system  or  mathematical  model 

d,x  — > 

~(t)  =  g(t,x(t),8)  (1) 

with  observation  process 

y(t)=Cx(t;6).  (2) 

Following  usual  convention  (which  agrees  with  the  data  usually  available  from  experiments), 

we  assume  a  discrete  form  of  the  observations  in  which  one  has  n  longitudinal  observations 
corresponding  to 

y(tj)  =  Cx(tj\  0),  j  —  1, . . .  ,n.  (3) 

In  general  the  corresponding  observations  or  data  {y)}  will  not  be  exactly  y(tj).  Because  of 
the  nature  of  the  phenomena  leading  to  this  discrepancy,  we  treat  this  uncertainty  pertaining 
to  the  observations  with  a  statistical  model  for  the  observation  process. 
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2.2  Description  of  Statistical  Model 

In  our  discussions  here  we  consider  a  statistical  model  of  the  form 

Yj  =  f{tj ,  Oo)  +  j  =  1, . . . ,  n,  (4) 

where  f(tj,9 )  =  Cx(tj',9),  j  =  1  corresponds  to  the  solution  of  the  mathematical 

model  (1)  at  the  jth  covariate  for  a  particular  vector  of  parameters  9  G  Rp,  x  G  RN ,  /  G  Rm, 
and  C  is  an  m  x  N  matrix.  The  term  90  represents  the  “truth”  or  the  parameters  that 
generate  the  observations  (The  existence  of  a  truth  parameter  90  is  standard  in 

statistical  formulations  and  this  along  with  the  assumption  that  the  means  E[ef\  are  zero 
yields  implicitly  that  the  (1)  is  a  correct  description  of  the  process  being  modeled.)  The  terms 
€j  are  random  variables  which  can  represent  measurement  error,  “system  fluctuations”  or 
other  phenomena  that  cause  observations  to  not  fall  exactly  on  the  points  f{tj,9)  from  the 
smooth  path  f(t,  9).  Since  these  fluctuations  are  unknown  to  the  modeler,  we  will  assume  e} 
is  generated  from  a  probability  distribution  (with  mean  zero  throughout  our  discussions)  that 
reflects  the  assumptions  regarding  these  phenomena.  For  instance,  in  a  statistical  model  for 
pharmacokinetics  of  drug  in  human  blood  samples,  a  natural  distribution  for  e  =  (ei, . . . ,  en)T 
might  be  a  multivariate  normal  distribution.  In  other  applications  the  distribution  for  e  might 
be  much  more  complicated  [22], 

The  purpose  of  our  presentation  here  is  to  discuss  methodology  related  to  the  estima¬ 
tion  of  the  true  value  of  the  parameters  9q  from  a  set  0  of  admissible  parameters,  and  its 
dependence  on  what  is  assumed  about  the  variance  var(e))  of  the  error  er  We  discuss  two 
inverse  problem  methodologies  that  can  be  used  to  calculate  estimates  9  for  9q:  the  ordinary 
least-squares  (OLS)  and  generalized  least-squares  (GLS)  formulations  as  well  as  the  popular 
maximum  likelihood  estimate  (MLE)  formulation  in  the  case  one  assumes  the  distributions 
of  the  error  process  (e)}  are  known. 

2.3  Known  error  processes:  Normally  distributed  error 

In  the  introduction  of  the  statistical  model  we  initially  made  no  mention  of  the  probability 
distribution  that  generates  the  error  er  In  many  situations  one  readily  assumes  that  the 
errors  e)  =  1,  ...,n,  are  independent  and  identically  distributed  (we  make  the  standing 
assumptions  of  independence  across  j  throughout  our  discussions  in  this  Chapter).  We 
discuss  a  case  where  one  is  able  to  make  further  assumptions  on  the  error,  namely  that 
the  distribution  is  known.  In  this  case,  maximum  likelihood  techniques  may  be  used.  We 
discuss  first  one  such  case  for  a  scalar  observation  system,  i.e.,  m  —  1.  If,  in  addition,  there 
is  sufficient  evidence  to  suspect  the  error  is  generated  by  a  normal  distribution  then  we  may 
be  willing  to  assume  6j  ~  A/"(0,  ctq),  and  hence  Yj  ~  A f{f(tj,  9 0),  Oq).  We  can  then  obtain  an 
expression  for  determining  90  and  a0  by  seeking  the  maximum  over  ( 9 ,  a2)  G  0  x  (0,  oo)  of 
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the  likelihood  function  for  €j  =  Yj  —  f(tj ,  6)  which  is  defined  by 

_  n  1  i 

U0,a2\Y)  =  n^pexP(-^5  K-/fe.»)l2}-  (5) 

The  resulting  solutions  0MLE  and  u^LE  are  the  maximum  likelihood  estimators  (MLEs)  for  do 
and  (Xq,  respectively.  We  point  out  that  these  solutions  C  =  tt)  md  <tLe  =  aj”  (K) 
are  random  variables  by  virtue  of  the  fact  that  Y  is  a  random  variable.  The  corresponding 
maximum  likelihood  estimates  are  obtained  by  maximizing  (5)  with  Y  =  (ly , . . .  ,Yn)T 
replaced  by  a  given  realization  y  =  (yi, . . .  ,yn)T  and  will  be  denoted  by  dMLE  =  d^LE  and 
d mle  =  b”LE  respectively.  In  our  discussions  here  and  below,  almost  every  quantity  of  interest 
is  dependent  on  n,  the  size  of  the  set  of  observations  or  the  sampling  size.  On  occasion  we 
will  express  this  dependence  explicitly  by  nse  of  superscripts  or  subscripts,  especially  when 
we  wish  to  remind  the  reader  of  this  dependence.  However,  for  notational  convenience  we 
will  often  suppress  the  notation  of  explicit  dependence  on  n. 

Maximizing  (5)  is  equivalent  to  maximizing  the  log  likelihood 

\ogL{6,cr2\Y)  =  -^log(2vr)  -  ^logu2  - —^[Yj  -  /(^,d)]2.  (6) 

j= i 

We  determine  the  maximum  of  (6)  by  differentiating  with  respect  to  6  (with  a2  fixed)  and 
with  respect  to  a2  (with  6  fixed),  setting  the  resulting  equations  equal  to  zero  and  solving 
for  6  and  a2.  With  a2  fixed  we  solve  -^\ogL(0,cr2\Y)  =  0  which  is  equivalent  to 

n 

£K-/M)1v/M)  =  o.  (t) 

j= i 

where  as  usual  V/  =  j^f  =  fy.  We  see  that  solving  (7)  is  the  same  as  the  least  squares 
optimization 

n 

#ml E(b')  =  arg min  J (Y , 6)  =  argmin  V'K  -  f(tj,6)]2.  (8) 

dee  eee 

We  next  fix  9  to  be  0MLB  and  solve  logT(6)MLE,  a2| Y)  =  0,  which  yields 

>?L,(y)  =  »l.).  (9) 

Note  that  we  can  solve  for  9MLE  and  u^LE  separately  -  a  desirable  feature,  but  one  that  does 
not  arise  in  more  complicated  formulations  discussed  below.  The  2nd  derivative  test  (which 
is  omitted  here)  verifies  that  the  expressions  above  for  dMLE  and  a2LE  do  indeed  maximize 
(6). 
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If,  however,  we  have  a  vector  of  observations  for  the  jth  covariate  tj  then  the  statistical 
model  is  reformulated  as 

Yj  =  f(tj,0o)  +  ej  (10) 

where  f^Rm  and 

Vo  =  var(ei)  =  diag^,  •  •  • ,  <rjm)  (11) 

for  j  =  1 , ,n.  In  this  setting  we  have  allowed  for  the  possibility  that  the  observation 
coordinates  YJ  may  have  different  constant  variances  af)  t .  i.e.,  af)  t  does  not  necessarily  have 
to  equal  k.  If  (again)  there  is  sufficient  evidence  to  claim  the  errors  are  independent  and 
identically  distributed  and  generated  by  a  normal  distribution  then  e]  ~  Mm{ 0,  Vo).  We  thus 
can  obtain  the  maximum  likelihood  estimators  0MLE  {{Yj})  and  UMLE  ({£,})  for  0O  and  V0  by 
determining  the  maximum  of  the  log  of  the  likelihood  function  for  e)  =  Yj  —  f(tj,  6)  defined 
by 


log  L(9,  V\{Yj  , . . . ,  YJ11}) 


lA  1 


“  o  Z  lo§  ali  -oE^-EK"  ^ 


n 

2 


i—  1 
m 


i=  1  j=i 


Y  loe°h  -  -  /Vs.  nVv-Ys  -  m, «)]. 

j= i 


i= 1 


Using  arguments  similar  to  those  given  for  the  scalar  case,  we  determine  the  maximum 
likelihood  estimators  for  90  and  V0  to  be 

n 

#mle  =  arg min  -  f(tj,  0)]rUML1E[Uj  -  f(tj,  9)}  (12) 

eee  j=i 

Umle  =  diag  -  f{tj,9M^E)]T^j  .  (13) 


Unfortunately,  this  is  a  coupled  system,  which  requires  some  care  when  solving  numerically. 
We  will  discuss  this  issue  further  in  Sections  2.4.2  and  2.4.5  below. 


2.4  Unspecified  Error  Distributions  and  Asymptotic  Theory 

In  Section  2.3  we  examined  the  estimates  of  6q  and  Vo  under  the  assumption  that  the  error 
is  normally  distributed,  independent  and  constant  longitudinally.  But  what  if  it  is  suspected 
that  the  error  is  not  normally  distributed,  or  the  error  distribution  is  unknown  to  the  modeler 
beyond  the  assumptions  on  E[Yj]  embodied  in  the  model  and  the  assumptions  made  on 
var(ej)  (as  in  most  applications)?  How  should  we  proceed  in  estimating  90  and  er0  (or  Vo) 
in  these  circumstances?  In  this  section  we  will  review  two  estimation  procedures  for  such 
situations:  ordinary  least  squares  (OLS)  and  generalized  least  squares  (GLS). 
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2.4.1  Ordinary  Least  Squares  (OLS) 

The  statistical  model  in  the  scalar  case  takes  the  form 

yi  =  m,9o)+ej  (14) 

where  the  variance  var(ej)  =  <Jq  is  assumed  constant  in  longitudinal  data  (note  that  the 
error’s  distribution  is  not  specified).  We  also  note  that  the  assumption  that  the  observation 
errors  are  uncorrelated  across  j  (i.e.,  time)  may  be  a  reasonable  one  when  the  observations 
are  taken  with  sufficient  intermittency  or  when  the  primary  source  of  error  is  measurement 
error.  If  we  define 


0OL s(Y)  =  c „(Y)  =  argmin^y  - /(tj.fl)]2  (15) 

KeU 

then  $ols  can  be  viewed  as  minimizing  the  distance  between  the  data  and  model  where  all 
observations  are  treated  as  of  equal  importance.  We  note  that  minimizing  in  (15)  corresponds 
to  solving  for  6  in 


£K-/M)]v/M)  =  o.  (is) 

3= 1 

We  point  out  that  0Ols  is  a  random  variable  (ej  =  Y)  —  f(tj ,  0)  is  a  random  variable);  hence 
if  is  a  realization  of  the  random  process  {Yj}j=1  then  solving 

n 

^ols  =  01 LS  =  arg min  Y^lVj  ~~  f{tj,  0)]2  (17) 

provides  a  realization  for  0o LS.  (A  remark  on  notation:  for  a  random  variable  or  estimator  6 
we  will  always  denote  a  corresponding  realization  or  estimate  with  an  over  hat,  e.g.,  0  is  an 
estimate  for  6.) 

Noting  that 

1  n 

(18) 

3= 1 

suggests  that  once  we  have  solved  for  0OLS  in  (15),  we  may  obtain  an  estimate  <TqLS  =  (t^le 
for  Uq. 

Even  though  the  error’s  distribution  is  not  specified  we  can  use  asymptotic  theory  to 
approximate  the  mean  and  variance  of  the  random  variable  6*0,^  [31].  As  will  be  explained 
in  more  detail  below,  as  n  — >  oo,  we  have  that 

Sols  =  Cs  ~  Vp(9„,  EJ)  «  A 0O,  ol[xnT%)  VlSo)]"1)  (19) 
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where  the  sensitivity  matrix  x(0)  =  Xn(d) 


{Xj'k}  is  defined  as 


df(tj,e ) 

d9k 


j  =  l,...,n,  k  =  l,...,p, 


and 

£?o  =  a20[nn  o]-1  (20) 

with 

=  lim  -XnT(0o)xn(0o),  (21) 

n—>  oo  Th 

where  the  limit  is  assumed  to  exist-see  [31].  However,  90  and  <Tq  are  generally  unknown,  so 
one  usually  will  instead  use  the  realization  y  =  (yi, . . .  ,yn)T  of  the  random  process  Y  to 
obtain  the  estimate 


0Ols  =  argrriinV^  -  f(tj,9)f 


0G© 


3= 1 


and  the  bias  adjusted  estimate 


n  —  p 


3= 1 


(22) 


(23) 


to  use  as  an  approximation  in  (19). 

We  note  that  (23)  represents  the  estimate  for  Og  of  (18)  with  the  factor  ^  replaced  by 
the  factor  — —  (in  the  linear  case  the  estimate  with  -  can  be  shown  to  be  biased  downward 
and  the  same  behavior  can  be  observed  in  the  general  nonlinear  case-  see  Chap.  12  of  [31] 
and  p.  28  of  [22]).  We  remark  that  (18)  is  true  even  in  the  general  nonlinear  case  (it  does 
not  rely  on  any  asymptotic  theories  although  it  does  depend  on  the  assumption  of  constant 
variance  being  correct). 

Both  6  =  0OLS  and  a2  =  <TqLS  will  then  be  used  to  approximate  the  covariance  matrix 

Y,n^tn  =  a2[xnT{9)xnm-\  (24) 

We  can  obtain  the  standard  errors  SE(90hS ,&)  (discussed  in  more  detail  in  the  next  section)  for 

the  kth  element  of  9OLS  by  calculating  SE(9QhS:k)  ~  \J Ekk-  Also  note  the  similarity  between 
the  MLE  equations  (8)  and  (9),  and  the  scalar  OLS  equations  (22)  and  (23).  That  is,  under 
a  normality  assumption  for  the  error,  the  MLE  and  OLS  formulations  are  equivalent. 

If,  however,  we  have  a  vector  of  observations  for  the  jth  covariate  tj  and  we  assume  the 
variance  is  still  constant  in  longitudinal  data,  then  the  statistical  model  is  reformulated  as 

Yj  —  f(tji  Oo)  +  €j  (25) 
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where  f^Rm  and 


V0  =  var(fj)  =  diaglcrjj, . . . ,  cftj 


(26) 


for  j  =  1  Just  as  in  the  MLE  case  we  have  allowed  for  the  possibility  that  the 

observation  coordinates  Y-  may  have  different  constant  variances  i.e.  cr'^j  does  not 
necessarily  have  to  equal  a'lk.  We  note  that  this  formulation  also  can  be  used  to  treat  the 
case  where  V0  is  used  to  simply  scale  the  observations,  i.e.,  V0  =  diagfdq, . . . ,  vm)  is  known. 
In  this  case  the  formulation  is  simply  a  vector  OLS  (sometimes  also  called  a  weighted  least 
squares  (WLS)).  The  problem  will  consist  of  finding  the  minimizer 


dOLS  =  argmin  -  f(tj:9)]TV0  1[Yj  -  f(tj,6)\ , 

r\  •  ■* 


KB 


(27) 


where  the  procedure  weights  elements  of  the  vector  Yj  —  f(tj,  9)  according  to  their  variability. 
(Some  authors  refer  to  (27)  as  a  generalized  least  squares  (GLS)  procedure,  but  we  will  make 
use  of  this  terminology  in  a  different  formulation  in  subsequent  discussions).  Just  as  in  the 
scalar  OLS  case,  9Q LS  is  a  random  variable  (again  because  e)  =  Yj  —  f(tj,6 )  is);  hence  if 
{yj}™=l  is  a  realization  of  the  random  process  {Y?}"=1  then  solving 


0ols  =  argmin ^[yj  -  f(tj ,  9)]TV0  1[yj  -  f(tj,  i 


ee© 


(28) 


i 


provides  an  estimate  (realization)  6  =  0o LS  for  90 LS.  By  the  definition  of  variance 


O0  =  diag  /J  (  ‘  ^T'Yj  -  f(tj ,  90)][Yj  -  f(tj,  d0)F 


3= 1 


so  an  unbiased  estimate  of  V0  for  the  realization  {i/)}”=1  is 

1 


V  =  diag 


n  —  p 


Lift  -  /ft •*)][»  -  /ft. 


(29) 


3- 1 


However,  the  estimate  9  requires  the  (generally  unknown)  matrix  Vo  and  Vo  requires  the 
unknown  vector  9q  so  we  will  instead  use  the  following  expressions  to  calculate  9  and  V : 


90~9  =  argmin  ~  f(tj,  6)}TV  1[Vj  ~  f(tj, 1 
6>e© 


(30) 


3= 1 

1 


n  —  v 

3= 1 


V0  Rj  V  —  diag 


(31) 


Note  that  the  expressions  for  9  and  V  constitute  a  coupled  system  of  equations,  which  will 
require  greater  effort  in  implementing  a  numerical  scheme. 

Just  as  in  the  scalar  case  we  can  determine  the  asymptotic  properties  of  the  OLS  estimator 
(27).  As  n  — >  oo,  0o LS  has  the  following  asymptotic  properties  [22,  31]: 

9OhS~Af(90,^),  (32) 


where 


-l 


yn  ~ 


d= i 


and  the  rn  x  p  matrix  Dj{9 )  =  D'HO)  is  given  by 


/  dh(tj,e)  dfi{tj,e)  \ 

1  dd1  ae  2  96»p  ' 


9fm(tj,0)  dfm(tj,8 ) 


901 


902 


dfm(tj,e ) 

aev 


(33) 


Since  the  true  value  of  the  parameters  90  and  Vo  are  unknown  their  estimates  9  and  V  will 
be  used  to  approximate  the  asymptotic  properties  of  the  least  squares  estimator  9OLS: 


0olS~a4(4s;)«a4(«,v 


(34) 


where 


-l 


'£E$0)V-'Di0) 


d= 1 


(35) 


The  standard  errors  can  then  be  calculated  for  the  kth  element  of  9OLS  (SE(9OLStk))  by 
SE(9OLStk)  ~  V^kk-  Again,  we  point  out  the  similarity  between  the  MLE  equations  (12) 
and  (13),  and  the  OLS  equations  (30)  and  (31)  for  the  vector  statistical  model  (25). 


2.4.2  Numerical  Implementation  of  the  OLS  Procedure 


In  the  scalar  statistical  model  (14),  the  estimates  9  and  &  can  be  solved  for  separately  (this 
is  also  true  of  the  vector  OLS  in  the  case  V0  =  where  Im  is  the  m  x  m  identity)  and 

thus  the  numerical  implementation  is  straightforward  -  first  determine  90 LS  according  to  (22) 
and  then  calculate  <TqLS  according  to  (23).  The  estimates  9  and  V  in  the  case  of  the  vector 
statistical  model  (25),  however,  require  more  effort  since  they  are  coupled: 


9  =  arg min  ~  f{tj,  0)}TV  *[$  -  f(tj,  9)\ 

V  =  diaS  iD&j  ~  d)]  Wi  ~  /(^>  ^)]T  j  • 


(36) 

(37) 


To  solve  this  coupled  system  the  following  iterative  process  will  be  followed: 
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1.  Set  W0'  =  I  and  solve  for  the  initial  estimate  9^  using  (36).  Set  k  —  0. 

2.  Use  9^>  to  calculate  using  (37). 

3.  Re-estimate  9  by  solving  (36)  with  V  =  UU+1)  to  obtain 

4.  Set  k  =  k  +  1  and  return  to  2.  Terminate  the  process  and  set  9Q LS  =  9<'k+1'>  when  two 
successive  estimates  for  9  are  sufficiently  close  to  one  another. 

2.4.3  Generalized  Least  Squares  (GLS) 

Although  in  Section  2.4.1  the  error’s  distribution  remained  unspecified,  we  did  however 
require  that  the  error  remain  constant  in  variance  in  longitudinal  data.  That  assumption 
may  not  be  appropriate  for  data  sets  whose  error  is  not  constant  in  a  longitudinal  sense.  A 
common  relative  error  model  that  experimenters  use  in  this  instance  for  the  scalar  observation 
case  [22]  is 


n  =  /fe.»o)(l  +  e3)  (38) 

where  E(Yj)  =  f(tj,90 )  and  var(Y))  =  <7q f2(tj,90)  which  derives  from  the  assumptions  that 
E[ej]  =  0  and  var(ej)  =  erg  .  We  will  say  that  the  variance  generated  in  this  fashion  is 
non-constant  variance.  The  method  we  will  use  to  estimate  9 0  and  <7q  can  be  viewed  as  a 
particular  form  of  the  Generalized  Least  Squares  (GLS)  method. 

To  define  the  random  variable  dGLS  the  following  equation  must  be  solved  for  the  estimator 

$GLS: 


wjlYj  -  fit J,  0GLS)r V/fo,  flGLS)  =  o,  (39) 

3= 1 

where  Yj  obeys  (38)  and  Wj  =  0GLS).  The  quantity  dGLS  is  a  random  variable,  hence 

if  {yj} ”=;i  is  a  realization  of  the  random  process  Yj  then  solving 

n 

£r2M)[»i  -  /M)]v/M)  =  o.  («) 

3= 1 


for  9  we  obtain  an  estimate  9a LS  for  dGLS. 

The  GLS  estimator  6*GLS  =  d”LS  has  the  following  asymptotic  properties  [22]: 


0GLs~A7p(do,S”) 


(41) 


where 

s; «  4  iFj(e„)W(g0)Fs(g0)) 


(42) 
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/  9/(tl,0)  9/(0  ,0) 

90i  902 


Fe<0)  =  Fm  = 


df{ti,6)  \ 


90- 


v 


df(tn,S)  df(tn,8) 
901  902 


df(tn,8) 

90d 


/ 


/  V/(i„9)T  \ 


V  V/(i„,9)T  J 


and  W  1(d)  =  diag  yf2(ti,  9), . . . ,  f2(tn,  9)J .  Note  that  because  90  and  <7q  are  unknown,  the 
estimates  9  =  §GLS  and  a2  =  <TqLS  will  be  used  in  (42)  to  calculate 

t"  =  a2  (F!(CI)W(8)Fg((l)' 
where  [22]  we  take  the  approximation 


(T2  ril  ff2  = 

°0  UGLS 


n 


P  i~i  9) 


[Vj  -  f(tj, 1 


We  can  then  approximate  the  standard  errors  of  0GLS  by  taking  the  square  roots  of  the 
diagonal  elements  of  E.  We  will  also  mention  that  the  solutions  to  (30)  and  (40)  depend 
upon  the  numerical  method  used  to  find  the  minimum  or  root,  and  since  So  depends  upon  the 
estimate  for  9q,  the  standard  errors  are  therefore  affected  by  the  numerical  method  chosen. 


2.4.4  GLS  motivation 

We  note  the  similarity  between  (16)  and  (40).  The  GLS  equation  (40)  can  be  motivated  by 
examining  the  weighted  least  squares  (WLS)  estimator 

n 

#WLS  =  argmin  -  f{tj,9)}2.  (43) 

e&eU  ' 

In  many  situations  where  the  observation  process  is  well  understood,  the  weights  {tu,-}  may  be 
known.  The  WLS  estimate  can  be  thought  of  minimizing  the  distance  between  the  data,  and 
model  while  taking  into  account  unequal  quality  of  the  observations  [22],  If  we  differentiate 
the  sum  of  squares  in  (43)  with  respect  to  9,  and  then  choose  Wj  =  f~2(tj,9),  an  estimate 
9gls  is  obtained  by  solving 

n 

-  f(tj,9)]Vf(tj,9)  =  0 

3= 1 

for  9.  However,  we  note  the  GLS  relationship  (40)  does  not  follow  from  minimizing  the 
weighted  least  squares  with  weights  chosen  as  w3  =  f~2(tj,9). 

Another  motivation  for  the  GLS  estimating  equation  (40)  can  be  found  in  [18].  In  the 
text  the  authors  claim  that  if  the  data  are  distributed  according  to  the  gamma  distribution, 
then  the  maximum-likelihood  estimator  for  9  is  the  solution  to 

n 

-  /(«;,«)]  V/(«„e)  =  0. 

3= 1 
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which  is  equivalent  to  (40).  The  connection  between  the  MLE  and  our  GLS  method  is 
reassuring,  but  it  also  poses  another  interesting  question:  What  if  the  variance  of  the  data 
is  assumed  to  not  depend  on  the  model  output  but  rather  on  some  function  g(tj,9 ) 

(i.e.,  var(lj)  =  <7q g2(tj,  6)  =  o\/wj)‘l  Is  there  a  corresponding  maximum  likelihood  estimator 
of  6  whose  form  is  equivalent  to  the  appropriate  GLS  estimating  equation  ( Wj  =  g~2(tj,9)) 

n 

-  /fe,#)]V/( tj,S)  =  0  ?  (44) 

3= 1 

In  their  text,  Carroll  and  Rupert  [18]  briefly  describe  how  distributions  belonging  to  the  expo¬ 
nential  family  of  distributions  generate  maximum-likelihood  estimating  equations  equivalent 
to  (44). 

2.4.5  Numerical  Implementation  of  the  GLS  Procedure 

Recall  that  an  estimate  0GLS  can  either  be  solved  for  directly  according  to  (40)  or  iteratively 
using  the  equations  outlined  in  Section  2.4.3.  The  iterative  procedure  as  described  in  [22]  is 
summarized  below: 

1.  Estimate  dGLS  by  9^  using  the  OLS  equation  (15).  Set  k  —  0. 

2.  Form  the  weights  Wj  =  f~2(tj,9 ^). 

3.  Re-estimate  9  by  solving 

n  2 

9(k+1)  =  argmm^^  (%•  -  f{tj,9)) 

3= 1 

to  obtain  the  k  +  1  estimate  for  0GLS. 

4.  Set  k  =  k  +  1  and  return  to  2.  Terminate  the  process  when  two  of  the  successive 
estimates  for  9G LS  are  sufficiently  close. 

We  note  that  the  above  iterative  procedure  was  formulated  by  minimizing  (over  9  e  0) 

n 

3= 1 

and  then  updating  the  weights  Wj  =  f~2(tj,9 )  after  each  iteration.  One  would  hope  that 
after  a  sufficient  number  of  iterations  Wj  would  converge  to  f~2(tj,  9GLS).  Fortunately,  under 
reasonable  conditions,  if  the  process  enumerated  above  is  continued  a  sufficient  number  of 
times  [22],  then  Wj  — >  f~2(tj,9GIjS). 
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3  Computation  of  Standard  Errors  and  Confidence 
Intervals 

We  return  to  the  case  of  n  scalar  longitudinal  observations  and  consider  the  OLS  case  of 
Section  2.4.1  (the  extension  of  these  ideas  to  vectors  is  completely  straight-forward).  These 
n  scalar  observations  are  represented  by  the  statistical  model 

Yj  =  f(tj,  0Q)  +  €j,  j  =  1,2,...,  n,  (45) 

where  f(tj,  Oq)  is  the  model  for  the  observations  in  terms  of  the  state  variables  and  6q  e  Mp 
is  a  set  of  theoretical  “true”  parameter  values  (assumed  to  exist  in  a  standard  statistical 
approach).  We  further  assume  that  the  errors  e3,  j  =  1,2 , ...  ,  n,  are  independent  identically 
distributed  ( i.i.d .)  random  variables  with  mean  E[e3]  =  0  and  constant  variance  var(ej)  =  <7q, 
where  <Jq  is  unknown.  The  observations  Yj  are  then  i.i.d.  with  mean  E[Yj\  =  f(tj,6o)  and 
variance  va r(Y))  =  a}y 

Recall  that  in  the  ordinary  least  squares  (OLS)  approach,  we  seek  to  use  a  realization 
{yj}  of  the  observation  process  {Yj}  along  with  the  model  to  determine  a  vector  0yLS  where 

n 

0no,s  =  arg  min  Jn{0)  =  Yyyi  ~  ^]2-  (46) 

1=1 

Since  Yj  is  a  random  variable,  the  corresponding  estimator  6n  =  9™LS  (here  we  wish  to  em¬ 
phasize  the  dependence  on  the  sample  size  n)  is  also  a  random  variable  with  a  distribution 
called  the  sampling  distribution.  Knowledge  of  this  sampling  distribution  provides  uncer¬ 
tainty  information  (e.g.,  standard  errors)  for  the  numerical  values  of  6n  obtained  using  a 
specific  data  set  {yj}.  In  particular,  loosely  speaking  the  sampling  distribution  characterizes 
the  distribution  of  possible  values  the  estimator  could  take  on  across  all  possible  realizations 
with  data  of  size  n  that  could  be  collected.  The  standard  errors  thus  approximate  the  extent 
of  variability  in  possible  values  across  all  possible  realizations,  and  hence  provide  a  measure 
of  the  extent  of  uncertainty  involved  in  estimating  6  using  the  specific  estimator  and  sample 
size  n  in  actual  data  collection. 

Linder  reasonable  assumptions  on  smoothness  and  regularity  (the  smoothness  require¬ 
ments  for  model  solutions  are  readily  verified  using  continuous  dependence  results  for  dif¬ 
ferential  equations  in  most  examples;  the  regularity  requirements  include,  among  others, 
conditions  on  how  the  observations  are  taken  as  sample  size  increases,  i.e.,  as  n  — >  oo),  the 
standard  nonlinear  regression  approximation  theory  ([22,  26,  29],  and  Chapter  12  of  [31]) 
for  asymptotic  (as  n  — >  oo)  distributions  can  be  invoked.  As  stated  above,  this  theory 
yields  that  the  sampling  distribution  for  the  estimator  6n(Y),  where  Y  =  {Y\ , . . .  ,Yn)T,  is 
approximately  a  /^multivariate  Gaussian  with  mean  E[9n(Y)]  «  60  and  covariance  matrix 
var (6n(Y))  «  Eg  =  o-gfnGo]”1  ~  cro[xnT(^o)xn(^o)]_1-  Here  xn(A)  =  F^9)  is  the  n  x  p 
sensitivity  matrix  with  elements 

xA»)  = ‘EtEl  and  FS(»)  =  frS(»))T, 
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where  =  |4( tj ,  9).  That  is,  for  n  large,  the  sampling  distribution  approximately  satisfies 


KixiY)  ~K(#o,K)  ~K(0o,<j2o[xnT(0o)xnm 


i-n 


(47) 


There  are  typically  several  ways  to  compute  the  matrix  Fg.  First,  the  elements  of  the 
matrix  y  =  (xjk)  can  always  be  estimated  using  the  forward  difference 


Xjk(9)  = 


9)  f(tj ,  9  +  hk)  -  f(tj,  9) 


89k 


|  h 


where  hk  is  a  p-vector  with  a  nonzero  entry  in  only  the  kth  component.  But,  of  course,  the 
choice  of  hk  can  be  problematic  in  practice. 

Alternatively,  if  the  f(tj,9 )  correspond  to  longitudinal  observations  y(tj)  =  Cx(tj]9 )  of 
solutions  x  G  MjV  to  a  parameterized  A-- vector  differential  equation  system  x  =  g(t,x(t),9 ) 
as  in  (1),  then  one  can  use  the  N  x  p  matrix  sensitivity  equations  (see  [4,  9]  and  the 
references  therein) 

d_  fdx\  =  dg_d^ +  dg_ 
dt\d9j  dxd  9  89 


to  obtain 


df(tj,9)  =cdx(tj,9) 


89  k 


89k 


Finally,  in  some  cases  the  function  f(tj,9 )  may  be  sufficiently  simple  so  as  to  allow  one  to 
derive  analytical  expressions  for  the  components  of  Fg. 

Since  9q,  oq  are  unknown,  we  will  use  their  estimates  to  make  the  approximation 


(wvwr  « s”(cs)  =  (es)V(cs)r, 


(49) 


where  the  approximation  a2  to  ctq,  as  discussed  earlier,  is  given  by 


al  «  a2  = 


n  —  p  *- 

3= 1 


- /ft.  Cs)]2- 


(50) 


Standard  errors  to  be  used  in  the  conhdence  interval  calculations  are  thus  given  by  SEk(9n)  = 
yji:kk(9n),  k  =  1,  2, . . .  ,p  (see  [19]). 

In  order  to  compute  the  conhdence  intervals  (at  the  100(1  —  a)%  level)  for  the  estimated 
parameters  in  our  example,  we  define  the  conhdence  level  parameters  associated  with  the 
estimated  parameters  so  that 

P{9nk  ~  ii-a/2 SEk{0n)  <  90k  <  9nk  +  F_a/2SEk{9n)}  =  1  -  a,  (51) 

where  a  G  [0,1]  and  ti_a/2  G  M+.  Given  a  small  a  value  (e.g.,  a  =  .05  for  95%  conhdence 
intervals),  the  critical  value  fi-a/2  is  computed  from  the  Student’s  t  distribution  tn~p  with 
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n  —  p  degrees  of  freedom.  The  value  of  n_Q/2  is  determined  by  P{T  >  fi-a/2}  =  a/2 
where  T  ~  tn~p.  In  general,  a  confidence  interval  is  constructed  so  that,  if  the  confidence 
interval  could  be  constructed  for  each  possible  realization  of  data  of  size  n  that  could  have 
been  collected,  100(1  —  a)%  of  the  intervals  so  constructed  would  contain  the  true  value 
9 ok-  Thus,  a  confidence  interval  provides  further  information  on  the  extent  of  uncertainty 
involved  in  estimating  60  using  the  given  estimator  and  sample  size  n. 

When  one  is  taking  longitudinal  samples  corresponding  to  solutions  of  a  dynamical  sys¬ 
tem,  the  n  x  p  sensitivity  matrix  depends  explicitly  on  where  in  time  the  observations  are 
taken  when  f(tj,6 )  =  Cx(tj,9)  as  mentioned  above.  That  is,  the  sensitivity  matrix 


X0)  =  FS{9) 


df(tj,9)\ 

dek  ) 


depends  on  the  number  n  and  the  nature  (for  example,  how  taken)  of  the  sampling  times 
{tj}.  Moreover,  it  is  the  matrix  [yTy]_1  in  (49)  and  the  parameter  a2  in  (50)  that  ultimately 
determine  the  standard  errors  and  confidence  intervals.  At  first  investigation  of  (50),  it 
appears  that  an  increased  number  n  of  samples  might  drive  b2  (and  hence  the  SE)  to  zero 
as  long  as  this  is  done  in  a  way  to  maintain  a  bound  on  the  residual  sum  of  squares  in  (50). 
However,  we  observe  that  the  condition  number  of  the  matrix  xTX  is  also  very  important 
in  these  considerations  and  increasing  the  sampling  could  potentially  adversely  affect  the 
inversion  of  xTX-  In  this  regard,  we  note  that  among  the  important  hypotheses  in  the 
asymptotic  statistical  theory  (see  p.  571  of  [31])  is  the  existence  of  a  matrix  function  Q(9) 
such  that 

Vo?V0  — ►  Q(0)  uniformly  in  9  as  n  — >  oo, 
n 

with  fl0  —  fi(0 o)  a  nonsingular  matrix.  It  is  this  condition  that  is  rather  easily  violated  in 
practice  when  one  is  dealing  with  data  from  differential  equation  systems,  especially  near  an 
equilibrium  or  steady  state  (see  the  examples  of  [4]). 

All  of  the  above  theory  readily  generalizes  to  vector  systems  with  partial,  non-scalar 
observations.  Suppose  now  we  have  the  vector  system  (1)  with  partial  vector  observations 
given  by  (5.1),  that  is,  we  have  m  coordinate  observations  where  m  <  N.  In  this  case,  we 
have 

d,x  -> 

-0 t)  =  g(t,x(t),9)  (52) 

and 

Vj  =  f(tj,  90)  +  €j  =  Cx(tj,  9o)  +  €j,  (53) 

where  C  is  an  mxN  matrix  and  /  G  Rm,  x  e  RN .  As  already  explained  in  Section  2.4.1,  if  we 
assume  that  different  observation  coordinates  ft  may  have  different  variances  of  associated 
with  different  coordinates  of  the  errors  6j,  then  we  have  that  e)  is  an  m-dimensional  random 
vector  with 

E[ej]  =  0,  var(ei)  =  V0, 
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where  V0  =  diag(org1,  ...,(t02J,  and  we  may  follow  a  similar  asymptotic  theory  to  calculate 
approximate  covariances,  standard  errors  and  confidence  intervals  for  parameter  estimates. 

Since  the  computations  for  standard  errors  and  confidence  intervals  (and  also  model 
comparison  tests )  depend  on  an  asymptotic  limit  distribution  theory ,  one  should  interpret  the 
findings  as  sometimes  crude  indicators  of  uncertainty  inherent  in  the  inverse  problem  findings. 
Nonetheless,  it  is  useful  to  consider  the  formal  mathematical  requirements  underpinning  these 
techniques. 

Among  the  more  readily  checked  hypotheses  are  those  of  the  statistical  model  requiring 
that  the  errors  €j,  j  =  1,2, ...  ,n,  are  independent  and  identically  distributed  ( i.i.d .)  random 
variables  with  mean  E[ej]  =  0  and  constant  variance  var(ej)  =  a/ 

•  After  carrying  out  the  estimation  procedures,  one  can  readily  plot  the  residuals  rj  = 
Vj  —  f(tj,0QLs )  vs ■  lnne  tj  and  the  residuals  vs.  the  resulting  estimated  model/  obser¬ 
vation  f(tj,0™LS)  values.  A  random  pattern  for  the  first  is  strong  support  for  validity 
of  independence  assumption;  a  non  increasing,  random  pattern  for  latter  suggests  as¬ 
sumption  of  constant  variance  may  be  reasonable. 

•  The  underlying  assumption  that  sampling  size  n  must  be  large  (recall  the  theory  is 
asymptotic  in  that  it  holds  as  n  — »  oo)  is  not  so  readily  “verified” -often  ignored  (albeit 
at  the  user’s  peril  in  regard  to  the  quality  of  the  uncertainty  findings). 

Often  asymptotic  results  provide  remarkably  good  approximations  to  the  true  sampling 
distributions  for  finite  n.  However,  in  practice  there  is  no  way  to  ascertain  whether  theory 
holds  for  a  specific  example. 

4  Investigation  of  Statistical  Assumptions 

The  form  of  error  in  the  data  (which  of  course  is  rarely  known)  dictates  which  method  from 
those  discussed  above  one  should  choose.  The  OLS  method  is  most  appropriate  for  constant 
variance  observations  of  the  form  Yj  =  f(tj,90 )  +  e3  whereas  the  GLS  should  be  used  for 
problems  in  which  we  have  nonconstant  variance  observations  Yj  =  f{tj,6/){l  +  e/). 

We  emphasize  that  in  order  to  obtain  the  correct  standard  errors  in  an  inverse  problem 
calculation,  the  OLS  method  (and  corresponding  asymptotic  formulas )  must  be  used  with 
constant  variance  generated  data,  while  the  GLS  method  (and  corresponding  asymptotic 
formulas )  should  be  applied  to  nonconstant  variance  generated  data. 

Not  doing  so  can  lead  to  incorrect  conclusions.  In  either  case,  the  standard  error  cal¬ 
culations  are  not  valid  unless  the  correct  formulas  (which  depends  on  the  error  structure) 
are  employed.  Unfortunately,  it  is  very  difficult  to  ascertain  the  structure  of  the  error,  and 
hence  the  correct  method  to  use,  without  a  priori  information.  Although  the  error  struc¬ 
ture  cannot  definitively  be  determined,  the  two  residuals  tests  can  be  performed  after  the 
estimation  procedure  has  been  completed  to  assist  in  concluding  whether  or  not  the  correct 
asymptotic  statistics  were  used. 
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4.1  Residual  Plots 


One  can  carry  out  simulation  studies  with  a  proposed  mathematical  model  to  assist  in 
understanding  the  behavior  of  the  model  in  inverse  problems  with  different  types  of  data  with 
respect  to  mis-specification  of  the  statistical  model.  For  example,  we  consider  a  statistical 
model  with  constant  variance  noise 


Yj  -  fitji&o)  + 


Var  (Yj)  = 


k2 


and  nonconstant  variance  noise 


10000 

k2 


v2, 


Y,  =  /((,-,  0 »)(1  +  —  €j),  Var(lj)  =  „). 


We  can  obtain  a  data  set  by  considering  a  realization  { y.j }”=1  of  the  random  process  {1}}”=1 
through  a  realization  of  {e^  }”=1  and  then  calculate  an  estimate  9  of  90  using  the  OLS  or  GLS 
procedure. 

We  will  then  use  the  residuals  r3  =  y3  —  f(tj,  9)  to  test  whether  the  data  set  is  i.i.d.  and 
possesses  the  assumed  variance  structure.  If  a  data  set  has  constant  variance  error  then 


Yj  =  f(tj,  ft)  +  €j  or  €j  =  Yj  -  f(tj,  do). 


Since  it  is  assumed  that  the  error  e3  is  i.i.d.  a  plot  of  the  residuals  r3  =  y3  —  f(tj,9)  vs.  tj 
should  be  random.  Also,  the  error  in  the  constant  variance  case  does  not  depend  on  f(tj,  9 0), 
and  so  a  plot  of  the  residuals  rq  =  y3  —  f(tj,  9)  vs.  f(tj,  9 )  should  also  be  random.  Therefore, 
if  the  error  has  constant  variance  then  a  plot  of  the  residuals  r3  =  y3  —  f(tj,d)  against  tj 
and  against  f(tj,d))  should  both  be  random.  If  not,  then  the  constant  variance  assumption 
is  suspect. 

We  turn  next  to  questions  of  what  to  expect  if  this  residual  test  is  applied  to  a  data  set 
that  has  nonconstant  variance  generated  error.  That  is,  we  wish  to  investigate  what  happens 
if  the  data  are  incorrectly  assumed  to  have  constant  variance  error  when  in  fact  they  have 
nonconstant  variance  error.  Since  in  the  nonconstant  variance  example,  Rj  =  Yj  —  f(tj,  90)  = 
f(tj,90)ej  depends  upon  the  deterministic  model  f(tj,90),  we  should  expect  that  a  plot  of 
the  residuals  r3  =  yj  —  f(tj,  9)  vs.  tj  should  exhibit  some  type  of  pattern.  Also,  the  residuals 
actually  depend  on  f(tj,  9)  in  the  nonconstant  variance  case,  and  so  as  f(tj,  9)  increases  the 
variation  of  the  residuals  r3  =  yj  —  f(tj,9 )  should  increase  as  well.  Thus  r3  =  yj  —  f(tj,9) 
vs.  f(tj,  9)  should  have  a  fan  shape  in  the  nonconstant  variance  case. 

In  summary,  if  a  data  set  has  nonconstant  variance  generated  data,  then 


Yj  =  f(tj>  0o)  +  fitj,  ft)  %  or  63 


Yj  ~  f  (tj,  d0) 
/Mo) 


If  the  distribution  e3  is  i.i.d.,  then  a  plot  of  the  modified  residuals  r™  =  (yj  —  f(tj,  d))/ f(tj,  9) 
vs.  tj  should  be  random  in  nonconstant  variance  generated  data.  A  plot  of  rf1  =  (yj  — 
f(tj,9))/f(tj,9 )  vs.  f(tj,9 )  should  also  be  random. 
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Another  question  of  interest  concerns  the  case  in  which  the  data  are  incorrectly  assumed 
to  have  nonconstant  variance  error  when  in  fact  they  have  constant  variance  error.  Since 
Yj  —  f(tj,9o )  =  6j  in  the  constant  variance  case,  we  should  expect  that  a  plot  of  r™  = 
(Vj  -  f(tj,d))/f(tj,9 )  vs.  tj  as  well  as  that  for  rf  =  (yj  -  f(tj,  0))/f{tj,  9)  vs.  f(tj,9 )  will 
possess  some  distinct  pattern. 

Two  further  issues  regarding  residual  plots:  As  we  shall  see  by  examples,  some  data  sets 
might  have  values  that  are  repeated  or  nearly  repeated  a  large  number  of  times  (for  example 
when  sampling  near  an  equilibrium  for  the  mathematical  model  or  when  sampling  a  periodic 
system  over  many  periods).  If  a  certain  value  is  repeated  numerous  times  (e.g.,  frepe at)  then 
any  plot  with  f(tj,9 )  along  the  horizontal  axis  should  have  a  cluster  of  values  along  the 
vertical  line  x  =  /repe at.  This  feature  can  easily  be  removed  by  excluding  the  data  points 
corresponding  to  these  high  frequency  values  (or  simply  excluding  the  corresponding  points 
in  the  residual  plots).  Another  common  technique  when  plotting  against  model  predictions 
is  to  plot  against  log f(tj,9)  instead  of  f{tj,9 )  itself  which  has  the  effect  of  “stretching  out” 
plots  at  the  ends.  Also,  note  that  the  model  value  f(tj,  9)  could  possibly  be  zero  or  very  near 
zero,  in  which  case  the  modified  residuals  R™  =  would  be  undefined  or  extremely 

large.  To  remedy  this  situation  one  might  exclude  values  very  close  to  zero  (in  either  the 
plots  or  in  the  data  themselves).  We  chose  here  to  reduce  the  data  sets  (although  this 
sometimes  could  lead  to  a  deterioration  in  the  estimation  results  obtained).  In  our  examples 
below,  estimates  obtained  using  a  truncated  data  set  will  be  denoted  by  9^’s  for  constant 
variance  data  and  0q“CsV  f°r  nonconstant  variance  data. 

4.2  Example  using  Residual  Plots 


We  illustrate  residual  plot  techniques  by  exploring  a  widely  studied  model  -  the  logistic 
population  growth  model  of  Verhulst/Pearl 

X 

x  =  rx(  1  — — ),  x(0)  =  x0.  (54) 


Here  K  is  the  population’s  carrying  capacity,  r  is  the  intrinsic  growth  rate  and  Xq  is  the 
initial  population  size.  This  well-known  logistic  model  describes  how  populations  grow  when 
constrained  by  resources  or  competition.  The  closed  form  solution  of  this  simple  model  is 
given  by 

x(t)  = _ Kx<** _ 

The  left  plot  in  Figure  1  depicts  the  solution  of  the  logistic  model  for  K  =  17.5,  r  =  .7  and 
Xo  =  1  for  0  <  t  <  25.  If  high  frequency  repeated  or  nearly  repeated  values  (i.e. ,  near  the 
initial  value  Xq  or  near  the  the  asymptote  x  =  K)  are  removed  from  the  original  plot,  the 
resulting  truncated  plot  is  given  in  the  right  panel  of  Figure  1  (there  are  no  near  zero  values 
for  this  function). 
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Figure  1:  Original  and  truncated  logistic  curve  with  K  =  17.5,  r  =  .7  and  xq  =  .1. 


True  Solution  vs.  Time  Truncated  True  Solution  vs.  Time 


For  this  example  we  generated  both  constant  variance  and  nonconstant  variance  noisy 
data  (  we  sampled  from  J\f( 0, 1)  random  variables  to  obtain  realizations  of  ej  )  and  obtained 
estimates  9  of  90  =  (K,  r,  xq )  by  applying  either  the  OLS  or  GLS  method  to  a  realization 
{yj}”=1  of  the  random  process  {Y)}™=1.  The  initial  guesses  6init  =  9^  along  with  estimates 
for  each  method  and  error  structure  are  given  in  Tables  1-4  (the  superscript  tcv  and  tncv 
denote  the  estimate  obtained  using  the  truncated  data  set).  As  expected,  both  methods  do 
a  good  job  of  estimating  90,  however  the  error  structure  was  not  always  correctly  specified 
since  incorrect  asymptotic  formulas  were  used  in  some  cases. 


Table  1:  Estimation  using  the  OLS  procedure  with  constant  variance  data  for  k  =  5. 


k 

— ' 

9  init 

9o 

/3cv 

UOLS 

SEfes) 

ntcv 

UOLS 

SE(te) 

5 

17 

17.5 

1.7500e+001 

1.5800e-003 

1.7494e+001 

6.4215e-003 

5 

.8 

.7 

7.0018e-001 

4.2841e-004 

7.0062e-001 

6.5796e-004 

5 

1.2 

.1 

9.9958e-002 

3.1483e-004 

9.9702e-002 

4.3898e-004 

Table  2:  Estimation  using  the  GLS  procedure  with  constant  variance  data  for  k  =  5. 


k 

— # 

^init 

o0 

/9CV 

^GLS 

SEfc) 

ntcv 

U  GLS 

SEfe) 

5 

17 

17.5 

1.7500e+001 

1.3824e-004 

1.7494e+001 

9.1213e-005 

5 

.8 

.7 

7.0021e-001 

7.8139e-005 

7.0060e-001 

1.6009e-005 

5 

1.2 

.1 

9.9938e-002 

6.6068e-005 

9.9718e-002 

1.2130e-005 
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Table  3:  Estimation  using  the  OLS  procedure  with  nonconstant  variance  data  for  k  —  5. 


k 

— * 

9  init 

0o 

nnev 

^OLS 

SE(fc) 

ntnev 

UOhS 

SEfe) 

5 

17 

17.5 

1.7499e+001 

2.2678e-002 

1.7411e+001 

7.1584e-002 

5 

.8 

.7 

7.0192e-001 

6.1770e-003 

7.0955e-001 

7.6039e-003 

5 

1.2 

.1 

9.9496e-002 

4.5115e-003 

9.4967e-002 

4.8295e-003 

Table  4:  Estimation  using  the  GLS  procedure  with  nonconstant  variance  data  for  k  —  5. 


k 

— * 

9  init 

0o 

nnev 

^GLS 

SEfc) 

ntnev 

UGLS 

SE(9S) 

5 

17 

17.5 

1.7498e+001 

9.4366e-005 

1.7411e+001 

3.1271e-004 

5 

.8 

.7 

7.0217e-001 

5.3616e-005 

7.0959e-001 

5.7181e-005 

5 

1.2 

.1 

9.9314e-002 

4.4976e-005 

9.4944e-002 

4.1205e-005 

When  the  OLS  method  was  applied  to  nonconstant  variance  data  and  the  GLS  method 
was  applied  to  constant  variance  data,  the  residual  plots  given  below  do  reveal  that  the  error 
structure  was  misspecihed.  For  instance,  the  plot  of  the  residuals  for  9^s  given  in  Figures 
4  and  5  reveal  a  fan  shaped  pattern,  which  indicates  the  constant  variance  assumption  is 
suspect.  In  addition,  the  plot  of  the  residuals  for  9™hS  given  in  Figures  6  and  7  reveal 
an  inverted  fan  shaped  pattern,  which  indicates  the  nonconstant  variance  assumption  is 
suspect.  As  expected,  when  the  correct  error  structure  is  specified,  the  i.i.d.  test  and  the 
model  dependence  test  each  display  a  random  pattern  (Figures  2,  3  and  Figures  8,  9). 

Also,  included  in  the  right  panel  of  Figures  2-9  are  the  residual  plots  with  the  truncated 
data  sets.  In  those  plots  only  model  values  between  one  and  seventeen  were  considered  (i.e. 
1  <  Hj  <  17).  Doing  so  removed  the  dense  vertical  lines  in  the  plots  with  f(tj,9 )  along  the 
x-axis.  Nonetheless,  the  conclusions  regarding  the  error  structure  remain  the  same. 
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Residual  Residual 


Figure  2:  Residual  plots:  Original  and  truncated  logistic  curve  for  with  k  —  5 


Residual  vs.  Time  with  OLS  &  CV  Data  Residual  vs.  Time  with  OLS  &  Truncated  CV  Data 


Figure  3:  Original  and  truncated  logistic  curve  for  9^ s  with  k  =  5. 


Residual  vs.  Model  with  OLS  &  CV  Data 


Model 


Residual  vs.  Model  with  OLS  &  Truncated  CV  Data 


Truncated  Model 
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Residual  Residual 


Figure  4:  Original  and  truncated  logistic  curve  for  8™™s  with  k  —  5. 


Residual  vs.  Time  with  OLS  &  NCV  Data  Residual  vs.  Time  with  OLS  &  Truncated  NCV  Data 


Figure  5:  Original  and  truncated  logistic  curve  for  0”™  with  k  —  5. 


Residual  vs.  Model  with  OLS  &  NCV  Data  Residual  vs.  Model  with  OLS  &  Truncated  NCV  Data 
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Figure  6:  Original  and  truncated  logistic  curve  for  with  k  —  5. 


Residual/Model  vs.  Time  with  GLS  &  CV  Data 


Residual/Model  vs.  Time  with  GLS  &  Truncated  CV  Data 


Figure  7:  Original  and  truncated  logistic  curve  for  0™  s  with  k  —  5. 


Residual/Model  vs.  Model  with  GLS  &  CV  Data 


Residual/Model  vs.  Model  with  GLS  &  Truncated  CV  Data 


Model 


Truncated  Model 
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Figure  8:  Original  and  truncated  logistic  curve  for  0”™  with  k  =  5 


Residual/Model  vs.  Time  with  GLS  &  NCV  Data 


Residual/Model  vs.  Time  with  GLS  &  Truncated  NCV  Data 


Time 


Figure  9:  Original  and  truncated  logistic  curve  for  0”™  with  k  =  5. 


Residual/Model  vs.  Model  with  GLS  &  NCV  Data 


Residual/Model  vs.  Model  with  GLS  &  Truncated  NCV  Data 


Truncated  Model 
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In  addition  to  the  residual  plots,  we  can  also  compare  the  standard  errors  obtained  for 
each  simulation.  At  a  quick  glance  of  Tables  1-4,  the  standard  error  of  the  parameter  K  in 
the  truncated  data  set  is  larger  than  the  standard  error  of  K  in  the  original  data  set.  This 
behavior  is  expected.  If  we  remove  the  “flat”  region  in  the  logistic  curve,  we  actually  discard 
measurements  with  high  information  content  about  the  carrying  capacity  K  [4].  Doing  so 
reduces  the  quality  of  the  estimator  K.  Another  interesting  observation  is  that  the  standard 
errors  of  the  GLS  estimate  are  more  optimistic  than  that  of  the  OLS  estimate,  even  when  the 
non-constant  variance  assumption  is  wrong.  This  example  further  solidifies  the  conclusion  we 
will  make  with  the  epidemiological  model  described  below  -  before  one  reports  an  estimate 
and  corresponding  standard  errors,  there  needs  to  be  some  assurance  that  the  proper  error 
structure  has  been  specified. 
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5  Pneumococcal  Disease  Dynamics  Model 

To  explore  these  ideas  in  the  context  of  epidemiology,  we  discuss  a  population  level  model 
of  pneumococcal  disease  dynamics  as  an  example.  This  model  has  previously  been  applied 
to  surveillance  data  available  via  the  Australian  National  Notifiable  Diseases  Surveillance 
System  in  [32],  Monthly  case  notifications  of  invasive  pneumococcal  disease  (IPD)  and  annual 
vaccination  information  were  used  to  estimate  unknown  model  parameters  and  to  assess  the 
impact  of  a  newly  implemented  vaccination  policy.  Here  we  illustrate,  with  this  example,  the 
effects  of  incorrect  versus  correct  statistical  models  assumed  to  represent  observed  data  in 
reporting  parameter  values  and  their  corresponding  standard  errors.  Most  importantly,  we 
discuss  relevant  residual  plots  and  how  to  use  these  to  determine  if  reasonable  assumptions 
on  observed  error  have  been  made. 


Figure  10:  Pneumococcal  infection  dynamics  with  vaccination. 

yl 


yfv 


In  this  model,  shown  in  Figure  10,  individuals  are  classified  according  to  their  epidemi¬ 
ological  status  with  respect  to  invasive  pneumococcal  diseases,  which  include  pneumonia, 
bacteremia,  meningitis  and  are  defined  as  the  presence  of  Streptococcus  pneumoniae  in  any 
normal  sterile  fluid  in  the  body.  Individuals  are  considered  susceptible,  or  in  the  S  class, 
in  the  absence  of  this  bacteria.  The  E  class  represents  individuals  whose  nasopharyngeal 
regions  are  asymptomatically  colonized  by  S.  pneumoniae ,  a  stage  that  is  typically  transient, 
but  always  precedes  infection.  Should  a  colony  of  S.  pneumoniae  be  successful  in  establishing 
an  infection,  the  individual  then  exhibits  a  clinical  condition  described  above,  and  is  then 
considered  infected  or  in  the  /  class.  We  consider  vaccines  which  prevent  progression  to 
infection,  or  possibly,  asymptomatic  colonization.  This  protection  is  not  complete,  and  the 
efficacy  with  which  this  is  accomplished  is  1  —  A  and  1  —  e,  respectively.  Once  vaccinated, 
individuals  may  enter  any  of  the  epidemiological  states,  Sy,  Ev,  and  Iy,  although  they  do 
so  with  altered  rates.  The  model  equations  (for  detailed  derivations,  see  [32])  are  given  by 
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^  =  A  -  pS-  +  Ev A| 1  +  —  +  +  7/  -  05  +  pSy  -  pS 

at  N 

(IE_  =  ^E  +  Ey  +  I  +  Iv  _aE_  1k(qe  _(f)E  +  pEy  _  jlE 
dSv  =  (j)S  -  epSvE  +  EvtI  +  IV  +  aEv  +  7 Iv  ~  pSy  -  pSy 


dt 

dEy 

dt 

dl 


=  efiSy- 


N 

E  +  Ey  +  /  +  /y 


N 


aEy  +  (j)E  —  pEy  —  5  K,(t)  Ey  —  pEy 


—  =  ln{t)E  -  (7  +  77  +  //)/ 


dl 


v 


dt 


=  8K{t)Ey  -  (7  +  rj  +  yLi)/y. 


(56) 

(57) 

(58) 

(59) 

(60) 
(61) 


Seasonality  of  invasive  pneumococcal  diseases  has  been  observed  and  studies  support  a 
seasonal  infection  rate,  k,  rather  than  a  seasonal  effective  contact  rate,  p.  Thus,  we  assume 
the  form 


K,(t)  =  Ko  (1  +  Hi  cos[o;(t  —  t)])  , 

for  K,(t)  to  reflect  seasonal  changes  in  host  susceptibility  to  pneumococcal  infection. 


5.1  Statistical  Models  of  Case  Notification  Data 

Monthly  case  notifications  f(tj,  9)  are  best  represented  as  integrals  of  the  new  infection  rates, 


[Ik(s)E(s)  +  5/s;(s)£,y(s)]  ds, 


(including  those  in  the  vaccinated  class)  over  each  month,  since  they  represent  the  number  of 
cases  reported  during  the  month  and  do  not  provide  any  information  on  how  long  individuals 
remain  in  an  infected  state.  We  use  these  data  to  estimate  9  =  (P,k 0,Hi)t .  Before  using  the 
model  with  surveillance  data,  we  test  the  model  and  methodology  capabilities  with  simulated 
“data” .  Following  the  procedures  in  the  logistic  example  discussions  in  Section  4,  we  generate 
data  according  to  two  statistical  models: 


Yj  —  f(tj,0o)  +  £j,  (62) 

Yj  =  f(tj,90)(  1  +  Ci),  (63) 

for  j  =  1,  ...,  n,  where  90  are  the  ‘true’  values  of  the  parameters  used  to  generate  the  data. 
In  both  (62)  and  (63),  the  e3  are  independent  and  identically  distributed  (■ i.i.d .)  random 
variables  with  E[e3]  =  0  and  var(e,)  =  cr(j.  In  model  (62),  however,  the  residual  is  then 
Rj  =  Yj  —  f(tj,90 )  =  €j  and  thus  Rj  satisfies  E[Rj\  =  0  and  var (Rj)  =  ctq.  As  before,  we 
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will  refer  to  this  error  with  constant  variance ,  or  CV.  The  second  case,  (63),  has  residuals 
of  the  form  Rj  =  Yj  —  f(tj,9o)  =  €jf(tj,0o),  so  the  residual  is  actually  proportional  to  the 
model,  f(tj,  90),  at  each  time  point  tj,  and  thus  this  is  an  example  of  error  with  nonconstant 
variance ,  or  NCV.  We  note  that  in  this  case  E[Rj\  =  0  and  var (Rj)  =  <7q f2(tj,90)  or  RY 
has  mean  zero  and  variance  <7g. 

For  illustration,  we  consider  the  same  four  cases  as  with  the  logistic  example  in  Section 
4: 

1.  OLS  estimation  of  9  using  data  generated  by  model  (62)  with  constant  variance  obser¬ 
vational  error:  9ols(Xcv)i 

2.  OLS  estimation  of  9  using  data  generated  by  model  (63)  with  nonconstant  variance 
observational  error:  90ls(Yncv ), 

3.  GLS  estimation  of  9  using  data  generated  by  model  (62)  with  constant  variance  obser¬ 
vational  error:  9gls0^cv)i 

4.  GLS  estimation  of  9  using  data  generated  by  model  (63)  with  nonconstant  variance 
observational  error:  9gls(Yncv)- 

We  compare  the  parameter  estimates  9  and  standard  errors  SE{9 )  obtained  in  each  case. 
Further  we  discuss  how  to  interpret  plots  of  rj  =  yj  —  f(tj,  9)  versus  tj  and  f(tj,  9)  to  assess 
whether  reasonable  assumptions  have  been  made  in  assuming  the  statistical  model  for  the 
data. 

5.2  Inverse  Problem  Results:  Simulated  Data 

Data  were  generated  with  n  =  60  time  points  (equivalent  to  five  years  of  data),  with  the  set 
of  parameters 


Error  was  added  to  the  forward  solution  according  to  two  statistical  models,  as  described  in 
Section  5.1.  In  the  case  of  constant  variance  observational  error,  the  error  is  scaled  to  the 
magnitude  of  the  model  but  not  in  a  time-dependent  manner.  In  this  case  we  generated  noisy 
data  by  sampling  from  a  A/"(0, 1)  distribution  (we  could  of  course  have  sampled  from  any 
other  random  variable).  Therefore,  for  constant  variance  error  of  about  k%  of  the  average 
magnitude  of  the  f(tj,90), 

So  in  this  case  Cj  ~  J\f( 0,  [ifgavg ',jf(tj,  #o)]2)  with  ej  (and  also  Rj)  i.i.d.  In  the  second  statis¬ 
tical  model,  the  error  depends  on  time  and  is  scaled  by  the  model  at  each  time  point,  i.e., 
the  error  is  relative.  In  this  case  the  error  is  added  to  the  observations  by 
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Table  5:  Parameter  estimates  from  data  with  constant  variance  CV  error. 


9 

O'o 

— * 

^init 

®ols 

SE(60ls ) 

Ogls 

SE(8gls ) 

P 

1.5 

1.55 

1.4845 

0.038 

1.51186 

0.017 

K0 

1.4e-3 

1.3e-3 

1.4188e-3 

2.1e-4 

1.3203e~3 

1.2e-4 

K 1 

0.55 

0.65 

0.56203 

0.050 

0.56047 

0.019 

RSS 

1.6831e4 

1.722e4 

Rj  =  f(tj ,  0o h  ~  f(tj,  0o)^m  l), 

with  €j  ~  J\f( 0,  $o)]2)>  and  again  the  e,-  are  i.i.d.,  but  now  the  77,  are  not  i.i.d.  This 

enables  ns  to  compare  different  types  of  error  on  the  same  scale:  one  independent  of  time 
and  observation  magnitude,  and  one  dependent  on  observation  magnitude,  and  thus  time. 
With  the  present  examples,  we  have  taken  k  =  10. 

The  results  from  using  an  OLS  and  GLS  estimator  with  data  generated  with  constant 
variance  error  are  displayed  in  Table  5,  and  the  fitted  model  solutions  displayed  in  Figure 
11.  Both  estimators  do  an  arguably  similar  job  at  producing  the  true  values,  that  is  Ools 
and  6gls  are  comparably  close  to  90.  The  standard  errors  SE{6gls)  for  the  GLS  estimator 
however,  are  all  smaller,  and  seem  to  indicate  that  the  corresponding  estimates  are  more 
“reliable”.  This,  however,  is  not  true  because  they  are  based  on  incorrect  formulae,  as  we 
shall  see  in  our  examination  of  the  error  plots  for  both  of  these  cases.  Note  that  from  Figure 
11  and  the  residual  sum  of  squares,  RSS,  in  both  cases,  there  is  no  clear  argument  from 
these  results  as  to  which  estimator  is  better  suited  for  use  with  the  data. 


Figure  11:  Best  fit  model  solutions  to  monthly  case  notifications  with  constant  variance  CV 
error. 

Best  fit  model  solution  using  OLS  estimation  Best  fit  model  solution  using  GLS  estimation 
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Tabic  6:  Parameter  estimates  from  data  with  nonconstant  variance  NCV  error. 


9 

O'o 

— * 

^init 

®OLS 

SE(0Ols ) 

Ogls 

SE(9gls ) 

P 

1.5 

1.55 

1.4876 

0.037 

1.4923 

0.0079 

K0 

1.4e-3 

1.3e-3 

1.4703e-3 

2.0e-4 

1.4301e~3 

7e-5 

Kl 

0.55 

0.65 

0.54531 

0.047 

0.54232 

0.012 

RSS 

1.6692e4 

1.676e4 

Figure  12:  Best  fit  model  solutions  to  monthly  case  notifications  with  nonconstant  variance 
NCV  error. 

Best  fit  model  solution  using  OLS  estimation  Best  fit  model  solution  using  GLS  estimation 


When  OLS  and  GLS  estimation  are  each  used  with  data  with  nonconstant  variance  error, 
the  parameters  and  standard  errors  in  Table  6  are  obtained,  and  the  plot  of  these  model 
solutions  over  the  generated  data  is  given  in  Figure  12.  Again,  one  estimator  does  not  do 
a  clearly  better  job  over  the  other  in  terms  of  predicting  parameter  values  closer  to  those 
used  to  generate  the  data.  However,  again,  the  standard  errors  from  the  GLS  estimation 
are  smaller  as  compared  to  those  of  the  OLS  estimation.  From  this,  it  would  seem  that  the 
GLS  estimation  would  always  give  ‘better’  parameter  values,  or  do  a  better  job  at  producing 
reliable  results.  However,  we  know  that  in  the  case  of  constant  variance  error,  the  GLS 
estimation  makes  some  incorrect  assumptions  on  the  data  generation  and  therefore,  the 
standard  errors  reported  there  would  give  a  false  sense  of  confidence  in  the  values  (indeed 
they  are  based  on  incorrect  asymptotic  formulae). 

5.2.1  Residual  Plots 

Here  we  illustrate  use  of  residual  plots  to  investigate  whether  our  assumptions  on  the  errors 
incurred  in  observation  of  data  are  correct  -  that  is,  whether  the  €j  are  i.i.d.  for  all  j  =  1, ...,  n, 
and  also  are  independent  of  the  observation  magnitude.  As  we  have  already  discussed  in 
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Section  4,  if  the  errors  are  i.i.d.  then  a  plot  of  the  residuals  r3  =  y3  —  versus  time 

tj  should  show  no  discernible  pattern.  Similarly,  a  plot  of  the  residual  r3  as  a  function  of 
the  model  values  f(tj,9)  should  be  random  if  there  is  no  relationship  between  these  two 
quantities.  While  use  of  the  OLS  estimation  tacitly  assumes  the  statistical  model  (62),  and 
therefore  the  residual  is  a  realization  of  the  error  random  variable,  this  is  not  true  of  the 
GLS  estimation.  In  that  case,  the  assumed  statistical  model  is  shown  in  (62)  with  e3  i.i.d. 
but  the  residual  r3  are  not  i.i.d.  for  all  j  =  1,  ...,  n.  Therefore,  in  the  case  of  GLS  we  should 
investigate  plots  of  the  the  residual/model  values,  Rj  =  instead  of  the  residuals. 

Figure  13:  Residual  ( Tj  =  i/j  —  f(tj,9 ))  plots  of  the  OLS  estimation  with  CV  data  (e,-  = 
Yj  —  o));  Left:  nontruncated,  Right:  truncated. 


OLS  estimation  with  CV  Data  OLS  estimation  with  CV  Data 


In  Figure  13,  we  see  the  relationship  between  the  residuals  and  time,  and  that  between 
residuals  and  the  model  values  when  the  OLS  estimation  procedure  is  applied  to  data  which 
has  been  generated  with  constant  variance  error.  In  both  the  top  and  bottom  panels  on 
the  left,  the  full  set  of  n  =  60  points  are  used,  while  on  the  right  hand  side,  only  one 
year,  or  n  =  12  points  have  been  used  for  the  estimation.  Both  top  panels  show  a  random 
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pattern,  so  the  errors  are  clearly  i.i.d.  But  in  the  bottom  left  plot,  we  observe  clustering  of 
residuals  around  certain  model  values,  although  there  is  no  clear  pattern  in  the  dependent 
variable,  just  in  the  independent  variable,  f(tj,9).  However,  we  recognize  that  this  is  due 
to  the  seasonality  of  the  data  and  model,  so  that  at  regular  repeated  time  points  over  many 
periods,  there  are  going  to  be  repeated  values  of  the  model.  As  evidence  of  this,  we  see 
that  when  only  one  period  is  plotted  (the  bottom  right  panel),  a  random  pattern  is  seen, 
and  we  confirm  that  the  errors  are  not  dependent  on  the  model  values.  Thus,  if  there  are 
vertical  bands  on  a  plot  such  as  this,  it  can  be  attributed  to  certain  model  values  repeating 
and  does  not  indicate  any  dependence  of  the  error  on  the  model  value.  To  check,  one  can 
simply  reduce  the  number  of  data  points  used  in  the  estimation  so  that  there  are  few  or  no 
repeated  values. 


Figure  14:  Residual  (r3  =  y3  —  f{t3,9))  plots  of  the  OLS  estimation  to  NCV  data  (e3  = 
^:/(^o));  Left:  nontruncated,  Right:  truncated. 
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When  OLS  estimation  is  carried  out  with  data  that  has  been  generated  according  to  the 
statistical  model  (63),  however,  the  independence  of  the  error  from  time  is  not  so  clear,  as 
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these  graphs  (Figure  14)  do  not  show  a  random  pattern.  While  there  is  no  clear  relation¬ 
ship,  there  is  some  randomness  in  the  residuals,  and  the  band  of  residuals  are  tighter,  not 
homogeneously  distributed  across  the  plot  as  in  Figure  13.  The  dependence  of  the  residuals 
on  model  value  magnitude  (seen  in  the  bottom  panels)  is  apparent  as  the  ry  clearly  increase 
with  increasing  model  values,  producing  a  fan  shape.  In  this  case  the  OLS  estimation  is 
used  incorrectly,  and  the  residual  plots  exhibit  a  clear  dependence  on  model  values  and  do 
not  confirm  independence  from  time. 

Figure  15:  Residual/Model  (  , . )  plots  of  the  GLS  estimation  to  CV  data  {e3  =  Y3  — 
f(tj,90 ));  Left:  nontruncated,  Right:  truncated. 


The  GLS  estimation  procedure,  however,  gave  smaller  standard  errors  regardless  of  the 
data  set  used,  and  therefore,  more  confidence  in  the  parameter  estimates.  However,  in  Figure 
15,  we  see  evidence  again  of  the  dependence  of  the  residuals  on  time  and  model  quantities, 
thus  indicating  that  our  assumptions  have  been  incorrect  for  GLS  estimation.  In  this  case, 
we  would  have  assumed  that  the  errors  are  proportional  to  the  observations,  thus  motivating 
a  GLS  estimator.  If  the  variance  is  constant  across  time  and  model  values,  and  the  GLS 
estimator  is  used,  we  should  expect  a  systematic  behavior  in  the  residual  plots.  Indeed,  the 
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plots  in  Figure  15  reveal  a  tight  band  of  points  in  the  rj  -  versus  t,  plots  and  the  reverse 

fan  shape  of  the  plot  of  the  residual/model  versus  the  model  values  f(tj,9).  This 

indicates  that  the  relations  which  give  us  the  parameter  estimates  and  their  standard  errors 
no  longer  hold  and  we  are  essentially  reporting  incorrect  values.  As  we  saw  in  Section  5.2, 
while  the  parameter  estimates  may  not  necessarily  be  poor,  the  reliability  provided  by  the 
standard  errors  is  incorrect. 


Figure  16:  Residual/Model  (  rj  -  )  plots  of  the  GLS  estimation  to  NCV  data  {e3  = 
Left:  nontruncated,  Right:  truncated. 

j{tj  ,6/q) 


GLS  estimation  with  NCV  Data  GLS  estimation  with  NCV  Data 


When  the  GLS  estimator  is  used  appropriately,  however,  the  randomness  of  the  error 
plots  suggest  reasonability  of  assumptions,  as  seen  in  Figure  16.  Here,  the  error  in  the 
data  has  been  generated  proportional  to  the  model  values,  and  therefore,  not  longitudinally 
constant.  So  when  we  plot  the  ratios  —  - J  ,  we  allow  for  this  dependence  and  see  the 

/  \tj  W/ 

random  patterns  we  would  expect  when  plotting  realizations  of  a  random  variable.  Again, 
the  vertical  bands  seen  in  the  bottom  left  panel  indicate  repeated  model  values,  as  can  be 
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seen  by  the  bottom  right  panel,  where  the  repetitions  have  been  excluded  from  the  data  set. 

5.3  Inverse  Problem  Results:  Australian  Surveillance  Data 

Using  the  iterative  weighted  least  squares  procedure  described  in  Section  2.4.2,  we  carried 
out  inverse  problem  calculations  with  the  model  and  observations  as  outlined  in  the  previous 
section  using  Australian  IPD  data  in  place  of  the  simulated  data.  In  this  case  we  assumed 
constant  variance  noise  in  the  data  and  hence  used  WLS  ,  e.g.,  see  (27),  for  our  estimation 
procedure.  Details  are  given  in  [32],  We  discuss  here  the  case  where  we  used  data  for  the 
period  2002-2004  (36  months  of  monthly  data  n \  =  36,  and  ri2  =  6  of  annual  vaccinated  or 
unvaccinated  cases)  and  estimated  9  =  (/?,  Ko,  ki,  5)t  along  with  <71,02  in  a  weighted  least 
squares  (WLS)  functional 


As  usual,  we  assume  there  exists  a  ‘true’  parameter  90  which  generated  the  data,  and  our 
statistical  model  is  then  given  by 

YV  =  /(1)(b,  0o)  +  ef  j  =  1, ...,  36,  (65) 

Yfc(2)  =  /(2>(t*,fl d)  +  42)  A:  =  1,2,3,  (66) 

Y®  =  f®(tk,0o)  +  e®  k  =  1,2,3.  (67) 

The  errors  (e  -  )  in  (65)  -  (67)  for  %  =  1,2, 3)  in  the  above  model  are  assumed  to  be 

random  variables  with  means  E[e^]  =  0  and  constant  variances  var(e^)  =  where 
o’oq  =  o-!,  0-0,2  =  <7o,3  =  cr2  are  unknown.  Thus  we  have  assumed  that  the  size  of  the  errors 
committed  at  each  time  for  a  given  kind  of  “measurement”  is  constant  and  also  does  not 
depend  on  the  magnitude  of  the  measurement  itself.  We  also  assume  that  are  independent 
and  identically  distributed  (■ i.i.d .)  random  variables  for  each  fixed  i.  The  observations  and 
the  model  quantities  are  related  by 

•  Y'j1'1  ~  /<%■,  6)  =  f*3+1  [k(s)E(s)  +  <5/c(s)£Jy(s)]  ds  for  j  =  1,  2, ..,  36  (monthly  cases), 

•  Yf‘2)  ~  =  f^+1  K(s)E(s)ds  for  k  —  1,2,3  (yearly  unvaccinated  cases), 

•  Y^]  ~  =  j^k+1  5hi(s)Ev(s)ds  for  k  —  1,2,3  (yearly  vaccinated  cases). 

The  data  fits  in  Figure  17  reveal  that  the  model  solution  with  the  parameters  shown  in  Table 
7  fits  the  Australian  surveillance  data  from  2002-2004,  with  the  top  panel  showing  the  fit  to 
the  monthly  case  notification  data,  the  bottom  left  panel  the  unvaccinated  cases  reported 
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Table  7:  Model  calibration  to  Australian  IPD  data  from  2002-2004;  estimation  of  ip  = 

(6,  di,  d2)T  =  0,  d0,  «i,  5,  di,  d^ _ 


Ip 

Ip 

SE(6) 

(3 

1.52175 

0.02 

K0 

1.3656e-3 

1.3e-4 

0.56444 

0.04 

5 

0.7197 

0.06 

<7\ 

28.924 

°2 

86.386 

annually,  and  the  bottom  right  the  annual  vaccinated  cases.  The  model  solution  and  data 
agree  well,  and  parameter  values  are  on  the  scale  of  our  initial  guesses,  although  their  values 
differ  slightly  to  minimize  the  cost  function  in  the  functional  (64).  Further,  the  standard 
errors  are  relatively  small  and  suggests  that  the  estimates  obtained  here  are  reliable. 


Figure  17:  Best  fit  solution  to  Australian  IPD  data  with  parameters  shown  in  Table  7.  Top 
panel:  monthly  cases;  bottom  left  panel:  annual  unvaccinated  cases;  bottom  right  panel: 
annual  vaccinated  cases. 
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To  test  the  assumptions  of  the  statistical  model  that  we  have  chosen  to  represent  our 
data,  we  plotted  the  residuals  between  the  model  and  observations  as  a  function  of  the  model, 
that  is,  r.j  =  y- 1  ^  —  f^(tj,9)  vs.  the  model  values  f^(tj,0)  (Figure  18).  The  lack  of  a  clear 
relationship  between  these  two  quantities  suggests  that  our  assumptions  are  reasonable  and 
the  residuals  of  each  observation  do  not  depend  on  the  model  values.  However,  we  see  six 
groups  of  points,  which  can  be  explained  by  the  oscillatory  pattern  of  the  infections.  In 
the  top  panel  we  have  plotted  just  one  half  of  the  period  of  the  infection  rate  and  see  a 
completely  random  pattern,  indicating  no  relationship  among  these  quantities.  When  we 
extend  this  time  period  for  another  half  of  a  period,  thus  plotting  an  entire  period  in  the 
middle  panel,  we  see  that  there  are  two  points  in  each  group  of  points.  Thus,  the  pattern 
observed  is  driven  by  the  seasonality  of  the  infections  and  not  by  any  incorrect  assumptions. 
On  the  contrary,  only  a  pattern  in  the  dependent  variable  (the  residuals)  would  suggest  that 
incorrect  assumptions  have  been  made.  This  analysis  suggests  that  it  is  reasonable  to  assume 
constant  variance  among  observations  of  the  same  type,  providing  support  for  the  statistical 
model  underlying  the  parameter  estimation  procedure. 


Figure  18:  Residuals  as  a  function  of  model  values.  Top  panel  is  over  the  period  January 
2003  through  June  2003,  middle  panel  is  for  January  2003  through  December  2003,  and 
bottom  panel  is  for  all  three  years. 
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6  Sensitivity  Functions 

The  sensitivity  matrices  x  =  introduced  in  Section  3  to  define  covariances  for  sampling 
distributions  and  associated  standard  errors  are  actually  well  known  in  the  applied  mathe¬ 
matics  and  engineering  literature,  where  they  arise  in  routine  sensitivity  analysis. 

In  actuality,  sensitivity  analysis  is  an  ensemble  of  techniques  [30]  that  can  provide  infor¬ 
mation  on  parameter  dependent  model  behavior,  yielding  a  much  better  understanding  of 
the  underlying  mathematical  model  with  a  resulting  marked  improvement  in  the  estimation 
results  obtained  using  the  models  in  simulations  and  inverse  problems.  Traditionally,  sen¬ 
sitivity  analysis  referred  to  a  procedure  used  in  simulation  studies  (direct  problems)  where 
one  evaluated  the  effects  of  parameter  variations  on  the  time  course  of  model  outputs  and 
identified  the  parameters  or  the  initial  conditions  to  which  the  model  is  most /least  sensitive. 
In  recent  years  however,  investigators’  attention  has  also  recently  turned  to  the  sensitivity 
of  the  solutions  to  inverse  problems  with  respect  to  data,  in  a  quest  for  optimal  selection 
of  data  measurements  in  experimental  design.  As  part  of  model  validation  and  verification, 
one  typically  needs  to  estimate  model  parameters  from  data  measurements,  and  a  related 
question  of  paramount  interest  is  related  to  sampling;  specifically,  at  which  time  points  the 
measurements  are  most  informative  in  the  estimation  of  a  given  parameter.  Due  to  the 
fact  that  in  practice  the  components  of  the  parameter  estimates  are  often  correlated,  tradi¬ 
tional  sensitivity  functions  (TSF)  used  alone  are  not  very  efficient  in  answering  this  question 
because  TSF  do  not  take  into  account  how  model  output  variations  affect  parameter  esti¬ 
mates  in  inverse  problems.  Investigators  [11,  34]  recently  proposed  a  new  class  of  sensitivity 
functions,  called  generalized  sensitivity  functions  (GSF),  which  provide  information  on  the 
relevance  of  measurements  of  output  variables  of  a  system  for  the  identification  of  specific 
parameters.  For  a  given  set  of  time  observations,  Thomaseth  and  Cobelli  use  theoretical 
information  criteria  (the  Fisher  information  matrix)  to  establish  a  relationship  between  the 
monotonicity  of  the  GSF  curves  with  respect  to  the  model  parameters  and  the  information 
content  of  these  observations.  Here  our  interest  is  in  how  to  use  this  information  content  tool 
along  with  TSF  to  improve  data  collection  for  estimation  of  parameters  in  inverse  problems. 
It  is,  of  course,  intuitive  that  sampling  more  data  points  from  the  region  indicated  by  the 
GSF  to  be  the  “most  informative”  with  respect  to  a  given  parameter  would  result  in  more 
information  about  that  parameter,  and  therefore  provide  more  accurate  estimates  for  it. 

To  define  and  discuss  these  sensitivity  functions  we  consider  the  general  mathematical 
model  (1)  with  iV-vector  solutions  x  depending  on  p- vector  parameters  6. 

6.1  Traditional  Sensitivity  Functions 

Traditional  sensitivity  functions  (TSF)  are  classical  sensitivity  functions  used  in  mathemat¬ 
ical  modeling  to  investigate  variations  in  the  output  of  a  model  resulting  from  variations  in 
the  parameters  and  the  initial  conditions. 

In  order  to  quantify  the  variation  in  the  state  variable  x{t)  with  respect  to  changes  in 
the  parameter  9  and  the  initial  condition  T0>  we  are  naturally  led  to  consider  traditional 
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sensitivity  functions  (TSF)  as  defined  by  the  derivatives 


sok(t)  = 

dx  .  , 

wk^ 

k  —  1, . 

■  ■  ,P, 

(68) 

rXo  ,{t)  = 

dx  .  . 

dJth 

1  =  1,. 

..,N, 

(69) 

where  Xqi  is  the  Z-th  component  of  the  initial  condition  x0.  If  the  function  g  is  sufficiently 
regular,  the  solution  x  is  differentiable  with  respect  to  and  Xqi,  and  therefore  the  sensitivity 
functions  sgk  and  rXol  are  well  defined. 

In  practice,  the  model  under  investigation  often  is  sufficiently  simple  to  allow  one  to 
compute  analytically  the  sensitivity  functions  (68)  and  (69).  This  is  precisely  the  case  (see 
(55))  with  the  logistic  growth  population  example  of  (54)  to  be  discussed  below.  However, 
when  one  deals  with  a  more  complex  model,  as  with  the  epidemiological  example  of  Section 
5,  it  is  often  preferable  to  consider  these  sensitivity  functions  separately  for  clarity  purposes. 

The  sensitivity  functions  are  local  in  nature  because  they  are  defined  by  partial  derivatives 
which  have  a  local  character.  Thus  sensitivity  and  insensitivity  (i.e.,  sgk  =  dx/dQk  very  close 
to  zero)  depend  on  the  time  interval,  the  state  values  x,  and  the  values  of  6  for  which  they 
are  considered.  For  example  in  a  certain  time  subinterval  we  might  find  that  sgk  is  small  so 
that  the  state  variable  x  is  insensitive  to  the  parameter  6 j.  on  that  particular  interval.  The 
same  function  sgk  can  take  large  values  on  a  different  subinterval,  indicating  that  the  state 
variable  x  is  quite  sensitive  to  the  parameter  6k  on  the  latter  interval.  From  the  sensitivity 

analysis  theory  for  dynamical  systems,  one  finds  (e.g.,  see  (48))  that  s  =  (s^, _ ,  sg  )  is  an 

N  x  p  vector  function  that  satisfies  the  matrix  ODE  system 

H*)  =  gx(t,x(t),0)s(t)  +  g§(t,x(t)J&)J  (70) 

s(t0)  0  Nxpi 

so  that  the  dependence  of  s  on  (t,x(t))  as  well  as  6  is  readily  apparent.  Here  we  have 
used  gg  =  dg/dx  and  g(j  =  dg/dd  to  denote  the  derivatives  of  g  with  respect  to  x  and  9 , 
respectively. 

The  sensitivity  functions  with  respect  to  the  components  of  the  initial  condition  xq  define 
an  N  x  N  vector  function  r  =  (rX01, . . . ,  fXoN ) ,  which  satisfies  the  matrix  system 

f{t)  =  &(*,£(*),  0M*)>  (71) 

r(t  o)  =  Inxn  ■ 

Equations  (70)  and  (71)  can  be  used  in  conjunction  with  equation  (1)  to  numerically  com¬ 
pute  the  sensitivities  s  and  r  for  general  cases  when  the  function  g  is  sufficiently  complicated 
to  prohibit  an  analytical  solution. 

In  many  cases  the  parameters  have  different  units  and  the  state  variables  may  have 
varying  orders  of  magnitude,  and  thus  in  practice  it  is  sometimes  more  convenient  to  work 
with  the  scaled  versions  of  the  TSF,  referred  to  as  relative  sensitivity  functions  (RSF). 
However,  here  we  will  focus  solely  on  the  non-scaled  sensitivities,  i.e.,  TSF. 
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6.2  Generalized  Sensitivity  Functions 

Recently  generalized  sensitivity  functions  were  proposed  by  Thomaseth  and  Cobelli  [34]  as 
a  new  tool  in  identification  studies  to  analyze  the  distribution  of  the  information  content 
(with  respect  to  the  model  parameters)  of  the  output  variables  of  a  system  for  a  given  set 
of  observations.  These  are  formulated  in  the  context  of  an  OLS  inverse  problem  framework 
in  [11,  34], 

We  consider  here  a  scalar  observation  model  with  discrete  time  measurements.  When 
m  —  1  and  C  is  a  1  x  N  array  in  (4)),  the  generalized  sensitivity  functions  (GSF)  are  defined 
as 

1  1  ^  ^ 

gs (t,)  =  V  X  V,-/(Mo)]  .  vy(i„«o),  (72) 

where  {ti},  l  =  1 , ,n  are  the  times  when  the  measurements  are  taken, 

n  i 

F  =  y  (73) 

3= 1  '  3> 

is  the  corresponding  p  x  p  Fisher  information  matrix  and  c2(tj)  is  the  observation  time 
dependent  variance.  The  symbol  represents  element-by-element  vector  multiplication 
(for  motivation  and  details  which  lead  to  the  definition  above,  the  interested  reader  may 
consult  [11,  34]).  The  Fisher  information  matrix  measures  the  information  content  of  the 
data  corresponding  to  the  model  parameters.  In  (72)  we  see  that  this  information  is  contained 
in  the  GSF,  making  them  appropriate  tools  to  indicate  the  relevance  of  the  measurements 
to  estimation  of  a  parameter  in  inverse  problems. 

We  observe  that  the  generalized  sensitivity  functions  (72)  are  vector-valued  functions 
with  the  same  dimension  as  6.  The  k-th  component  gsk  of  the  vector  function  gs  represents 
the  generalized  sensitivity  function  with  respect  to  6k-  The  GSF  in  (72)  are  defined  only 
at  the  discrete  time  points  {tj,j  =  l,...,n}  and  they  are  cumulative  functions  involving 
at  time  ti  only  the  contributions  of  those  measurements  up  to  and  including  tp  thus  gSk 
calculates  the  influence  of  measurements  up  to  on  the  parameter  estimate  for  0k- 

It  is  readily  seen  from  the  definition  that  all  the  components  of  gs  are  one  at  the  final 
time  point  tn,  i.e. ,  gs (tn)  =  1.  If  one  defines  gs (t)  =  0  for  t  <t\  (naturally,  gs  is  zero  when 
no  measurements  are  collected),  then  each  component  of  gs  transitions  (not  necessarily 
monotonically)  from  zero  to  one.  As  developed  in  [11,  34],  the  time  subinterval  during  which 
the  change  in  ry.sy.  has  the  sharpest  increase  corresponds  to  the  observations  which  provide  the 
most  information  in  the  estimation  of  9k.  That  is,  regions  of  sharp  increases  in  gsk  indicate 
a  high  concentration  of  information  in  the  data  about  9k-  Thus,  the  utility  of  these  functions 
in  design  of  experiments  is  rather  obvious. 

The  numerical  implementation  of  the  generalized  sensitivity  functions  (72)  is  straight¬ 
forward,  since  the  gradient  of  /  with  respect  to  6  (or  To)  is  simply  the  Jacobian  of  x  with 
respect  to  9  (or  x0 )  multiplied  by  the  observation  operator  C.  These  Jacobian  matrices  can 
be  obtained  by  numerically  solving  the  sensitivity  ODE  system  (70)  or  (71)  coupled  with  the 
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system  (1).  One  would  need  to  use  this  approach  to  compute  the  GSF  for  the  epidemiological 
model  of  Section  5.  For  the  the  logistic  model  used  below  to  illustrate  ideas,  the  solution  of 
equation  (54)  given  by  (55)  is  sufficiently  simple  to  permit  an  analytical  representation  of 
the  Jacobians. 

6.3  TSF  and  GSF  for  the  Logistic  Equation 

The  Verhulst-Pearl  logistic  equation  (54)  is  a  relatively  simple  example  with  easily  deter¬ 
mined  dynamics  that  is  useful  in  demonstrating  the  utility  of  the  traditional  sensitivity 
functions  as  well  as  the  generalized  sensitivity  functions  in  inverse  problems  (see  [2,  4]  for 
more  discussions  on  TSF  and  GSF  for  this  example).  Unless  data  are  sampled  from  regions 
with  changing  dynamics,  it  is  possible  that  some  of  the  parameters  will  be  difficult  to  es¬ 
timate.  Moreover,  the  parameters  that  are  obtainable  may  have  high  standard  errors  as  a 
result  of  introducing  redundancy  in  the  sampling  region  (this  is  illustrated  in  [2]).  In  order  to 
investigate  sensitivity  for  the  logistic  growth  example,  we  will  examine  varying  behavior  in 
the  model  depending  on  the  region  from  which  tj  is  sampled.  We  consider  points  T\  and  r2, 
as  depicted  in  Figure  19,  partitioning  the  logistic  solution  curve  into  three  distinct  regions: 
0  <  tj  <  7i,  7i  <  tj  <  t2,  and  r2  <  t3  <  T,  with  T  sufficiently  large  for  our  solution  to  be 
near  its  asymptote  x  =  Ii.  Based  on  the  changing  dynamics  of  the  curve  in  Figure  19,  we 
expect  differences  in  the  ability  to  estimate  parameters  depending  on  the  region  in  which 
the  solution  is  observed. 

Figure  19:  Regions  with  different  growth  in  the  Verhulst-Pearl  solution  curve. 


We  consider  the  logistic  model  with  true  parameters  6 0  =  (17.5, 0.7,  0.1).  We  analyze  the 
TSF  corresponding  to  each  parameter  in  the  initial  region  of  the  curve,  where  the  solution 
approaches  Xq  as  t  — >  0.  When  we  consider  the  initial  region  of  the  curve,  where  0  <  tj  <  T\ 
for  j  —  1, . . . ,  n,  we  have 


dx(tj) 

<)K 


0, 


dr  ’  dx0 
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this  follows  from  considering  the  limits  of  the  readily  computed  analytical  sensitivity  func¬ 
tions  as  t  — »  0.  Based  on  these  analytical  findings,  which  indicate  low  sensitivities  with 
respect  to  K  and  r,  we  expect  to  have  little  ability  to  determine  these  parameters  when  we 
sample  data  from  [0,  Ti];  however  we  should  be  able  to  estimate  xo-  This  is  confirmed  by  the 
computational  examples  in  [2,  4], 

We  next  consider  the  region  of  the  curve  which  is  near  the  asymptote  at  x  =  K,  in  this 
case  for  r2  <  tj  <  T,  j  =  1, . . . ,  n.  Here  we  find  that  by  considering  the  limits  as  t  —>  oo,  we 
have  the  approximations 

dx(tj )  ^  1  dxjtj)  _  dx(tj)  _ 

dK  ’  dr  ’  dx0 

Based  on  these  approximations,  we  expect  to  be  able  to  estimate  K  well  when  we  sample 
data  from  [t2,T].  However,  using  data  only  from  this  region,  we  do  not  expect  to  be  able 
to  estimate  very  well  either  Xo  or  r.  Again  these  expectations  are  readily  confirmed  by  the 
inverse  problem  calculations  presented  in  [2,  4]. 

Finally,  we  consider  the  part  of  the  solution  curve  where  T\  <  tj  <  r2  for  j  =  1 ,n 
and  where  it  has  nontrivially  changing  dynamics.  We  find  that  the  partial  derivative  values 
differ  greatly  from  the  values  in  regions  [0,  T\]  and  [r2,T],  When  [ti,t2]  is  included  in  the 
sampling  region  we  expect  to  recover  good  estimates  for  all  three  parameters  (expectations 
that  are  met  in  [2,  4]). 


Figure  20:  (a)  TSF  and  (b)  GSF  corresponding  to  each  parameter  for  the  logistic  curve  with 
60  =  (17.5,0.7,0.1). 


TSF  GSF 


Our  analytical  observations  are  fully  consistent  with  information  contained  in  the  graphs 
of  the  TSF  illustrated  in  Figure  20(a)  for  T  =  25.  We  note  that  the  curve  Sk  slowly  increases 
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with  time  and  it  appears  that  the  solution  is  insensitive  to  K  until  around  the  flex  point  of 
the  logistic  curve,  which  occurs  shortly  after  t  —  7  in  this  case.  The  sensitivities  sk  and  sr 
both  are  close  to  zero  when  t  is  near  the  origin,  and  hence  we  deduce  that  both  K  and  r 
will  be  difficult  or  impossible  to  obtain  using  data  in  that  region.  Also,  we  observe  that  sXo 
and  sr  are  nearly  zero  in  [15,25],  which  suggests  that  we  will  be  unable  to  estimate  Xq  or  r 
using  observations  in  that  region. 

We  numerically  computed  the  GSF  using  equation  (72)  with  a  =  1  and  the  true  value 
parameters  6q  =  (17.5,  0.7,  0.1).  The  plots  of  these  functions  are  shown  in  Figure  20(b)  where 
one  can  observe  obvious  regions  of  steep  increase  in  each  curve.  For  the  curves  gsXo(t),  gsr(t ) 
and  gsxit),  we  find  by  visual  inspection  that  these  regions  are  approximately  [4.5,  7.5],  [7, 11] 
and  [12,  25],  respectively.  By  the  generalized  sensitivity  theory,  if  we  increase  the  number  of 
data  points  sampled  in  one  of  these  regions,  the  estimation  of  the  corresponding  parameter 
is  expected  to  improve.  This  is  precisely  what  happens  in  the  computational  examples  found 
in  [2,  4], 

While  the  general  algorithms  are  still  under  development,  the  following  scenario  involving 
TSF  and  GSF  in  design  of  experiments  for  data  to  be  used  in  OLS  and  GLS  formulations 
are  envisioned: 

1.  One  proposes  a  mechanism,  interaction,  etc.,  as  represented  by  a  term  or  terms  (such  as 
a  nonlinearity,  probability  distribution,  etc.)  in  a  model  (ODE,  PDE,  etc.).  One  then 
uses  methodology  based  on  the  TSF,  GSF  and  the  Fisher  Information  Matrix  (FIM) 
calculations  to  suggest  design  of  experiments  to  collect  data  (duration  of  experiment, 
sampling  sizes,  frequency  in  time,  space,  age/size  class,  etc.)  to  be  used  in  inverse 
problcm/parameter  estimation  techniques  to  investigate  the  mechanistic  based  terms. 

2.  One  then  designs  and  carries  out  the  experiments  resulting  from  1.  with  guidance  in 
data  collection  (variables  required  to  be  observed,  sampling  frequency,  measurement 
accuracy  needed,  etc)  being  provided  for  each  class  of  models  to  be  used  with  the  data; 
questions  and  models  usually  will  be  driven  by  mechanism  based  formulation. 

3.  Finally,  one  can  carry  out  post  experimental  modeling  analysis  (parameter  estimation 
and  inverse  problems  with  both  OLS  and  GLS  ,  statistical  analysis  of  variance  in 
data  and  model  fits  with  residual  plots,  hypothesis  testing  and  model  comparison  as 
described  in  the  next  several  sections,  Kullback-Leibler  distance  based  and  information 
content  based  model  selection  techniques  such  as  AIC  and  recent  generalizations  [16,  17] 
and  improvements,  etc.)  to  provide  a  modeling  framework  and  methodology  for  future 
investigations  of  the  type  proposed  here.  In  the  post  analysis  one  can  also  carry  out 
verification  and  validation  type  studies  as  well  as  testing  predictive  capabilities.  This 
can  be  done  in  part  by  comparing  the  models  with  data  that  was  not  used  in  the  inverse 
problems  for  estimation  of  parameters. 
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7  Statistically  Based  Model  Comparison  Techniques 


In  previous  sections  we  have  discussed  techniques  (e.g.,  residual  plots)  for  investigating  cor¬ 
rectness  of  the  assumed  statistical  model  underlying  the  estimation  (OLS  or  GLS)  procedures 
used  in  inverse  problems.  To  this  point  we  have  not  discussed  correctness  issues  related  to 
choice  of  the  mathematical  model.  However  there  are  a  number  of  ways  in  which  questions 
related  to  the  mathematical  model  may  arise.  In  general,  modeling  studies  [7,  8]  can  raise 
questions  as  to  whether  a  mathematical  model  can  be  improved  by  more  detail  and/or  fur¬ 
ther  refinement?  For  example,  one  might  ask  whether  one  can  improve  the  mathematical 
model  by  assuming  more  detail  in  a  given  mechanism  (constant  rate  vs.  time  or  spatially 
dependent  rate  -  e.g.,  see  [1]  for  questions  related  to  time  dependent  mortality  rates  during 
sub- lethal  damage  in  insect  populations  exposed  to  various  levels  of  pesticides).  Or  one 
might  question  whether  an  additional  mechanism  in  the  model  might  produce  a  better  fit 
to  data-see  [5,  6,  7]  for  diffusion  alone  or  diffusion  plus  convection  in  cat  brain  transport  in 
grey  vs.  white  matter  considerations. 

Before  continuing  an  important  point  must  be  made:  In  model  comparison  results  out¬ 
lined  below,  there  are  really  two  models  being  compared:  the  mathematical  model  and  the 
statistical  model.  If  one  embeds  the  mathematical  model  in  the  wrong  statistical  model  (for 
example,  assumes  constant  variance  when  this  really  isn’t  true),  then  the  mathematical  model 
comparison  results  using  the  techniques  presented  here  will  be  invalid  (e.g.,  worthless ).  An 
important  remark  in  all  this  is  that  you  must  have  the  mathematical  model  you  want  to 
simplify  or  improve  (e.g.,  test  whether  V  =  0  or  not  in  the  example  below)  embedded  in  the 
correct  statistical  model  (determined  in  large  part  by  the  observation  process),  so  that  the 
comparison  really  is  only  with  regard  to  the  mathematical  model. 

To  provide  specific  motivation,  we  illustrate  the  formulation  of  hypothesis  testing  by 
considering  a  mathematical  model  for  a  diffusion-convection  process.  This  model  was  pro¬ 
posed  for  use  with  experiments  designed  to  study  substance  (labeled  sucrose)  transport  in 
cat  brains,  which  are  heterogeneous,  containing  grey  and  white  matter  [7].  In  general,  the 
transport  of  substance  in  cat’s  brains  can  be  described  by  a  PDE  describing  change  in  time 
and  space.  This  convection/diffusion  model,  which  is  widely  discussed  in  the  applied  math¬ 
ematics  and  engineering  literature,  has  the  form 


du  du  ~d2u 

- b  V —  =  V - . 

dt  dx  dx2 


(74) 


Here,  the  parameter  9  =  (T>,  V),  which  belongs  to  some  admissible  parameter  set  0,  denotes 
the  diffusion  coefficient  T>  and  the  bulk  velocity  V  of  the  fluid,  respectively.  Our  problem: 
test  whether  the  parameter  V  plays  a  significant  role  in  the  mathematical  model.  That  is, 
if  the  model  (74)  represents  a  diffusion-convection  process,  we  seek  to  determine  whether 
diffusion  alone  or  diffusion  plus  convection  best  describes  transport  phenomena  represented 
in  cat  brain  data  sets  {ytj}  for  {u(ti,Xj]6)},  the  concentration  of  labeled  sucrose  at  times 
{ti}  and  location  {xj}.  We  then  may  take  H0  :  V  =  0  and  the  alternative  Ha  :  V  /  0. 
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Consequently,  the  restricted  parameter  set  0#  C  0  defined  by 

0#  =  {6  G  0  :  V  =  0} 

will  be  important.  To  carry  out  these  determinations,  we  will  need  some  model  comparison 
tests  of  analysis  of  variance  (AN OVA)  type  from  statistics  involving  residual  sum  of  squares 
(RSS). 

7.1  RSS  Based  Statistical  Tests 

In  general,  we  assume  an  inverse  problem  with  mathematical  model  f(t,  9)  and  n  observations 
Y  =  {Yj}'j=1.  We  define  an  OLS  performance  criterion 

-  1  n 
J„W  =  Jn(Y,e)  =  -jyr,  -  f(tve)f, 

n  i- 1 

where  our  statistical  model  again  has  the  form 

Yj  =  f(tj,0o)  +  tj,  J  = 

with  {ej}"=1  independent  and  identically  distributed,  E(ej)  =  0  and  constant  variance 
varicj)  =  a2.  As  usual  90  is  the  “true”  value  of  6  which  we  assume  to  exist.  As  noted 
above,  we  use  0  to  represent  the  set  of  all  the  admissible  parameters  6  and  assume  that  0 
is  a  compact  subset  of  Euclidean  space  of  Rp  with  d0  e  0. 

Let  9n(Y)  =  9qLS(Y )  be  the  OLS  estimator  using  Jn  with  corresponding  estimate  9n  = 
0%LS(y)  f°r  a  realization  y  =  {yj}.  That  is, 

9n(Y )  =  arg  min  Jn(Y,  9)  and  9n  =  arg  min  Jn(y,  9). 
dee  dee 

Remarks:  In  most  calculations,  one  actually  uses  an  approximation  fN  to  /,  often  a 
numerical  solution  to  the  ODE  or  PDE  for  modeling  the  dynamical  system.  Here  we  tacitly 
assume  fN  will  converge  to  /  as  the  approximation  improves.  There  are  also  questions  related 
to  approximations  of  the  set  0  when  it  is  infinite  dimensional  (e.g.,  in  the  case  of  function 
space  parameters  such  as  time  or  spatially  dependent  parameters)  by  finite  dimensional 
discretizations  @M .  For  extensive  discussions  related  to  these  questions,  see  [8]  as  well  as  [6] 
where  related  assumptions  on  convergences  fN  — >  /  and  0M  — >  0  are  given.  We  shall  ignore 
these  issues  here,  keeping  in  mind  that  these  approximations  will  also  be  of  importance  in 
the  methodology  discussed  below  in  most  practical  uses. 

In  many  instances,  including  the  motivating  example  given  above,  one  is  interested  in 
using  data  to  address  the  question  whether  or  not  the  “true”  parameter  90  can  be  found  in 
a  subset  0#  C  0  which  we  assume  for  discussions  here  is  defined  by 

QH  =  {9e  0| H9  =  c}  (75) 
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where  H  is  an  r  x  p  matrix  of  full  rank,  and  c  is  a  known  constant. 
In  this  case  we  want  to  test  the  null  hypothesis  H0 :  90  e  0# . 
Define  then 


Oh(Y)  =  arg  min  Jn(Y,9 ) 
e&eH 


and  9fi  =  arg  min  Jn(y,9 ) 
oeeH 


and  observe  that  Jn(Y}9fi)  >  Jn(Y,9n).  We  define  the  related  non-negative  test  statistics 
and  their  realizations,  respectively,  by 

Tn(Y)  =  n(Jn(Y ,  9nH)  -  Jn(Y,  9n ))  and  fn  =  Tn(y)  =  n(Jn(y,  9nH)  -  Jn(y,  9n )). 

One  can  establish  asymptotic  convergence  results  for  the  test  statistics  Tn(Y),  as  given 
in  detail  in  [6].  These  results  can,  in  turn,  be  used  to  establish  a  fundamental  result  about 
much  more  useful  statistics  for  model  comparison.  We  define  these  statistics  by 


Un{Y) 


]niY} 

Jn(Y,en)’ 


(76) 


with  corresponding  realizations  Un  =  Un(y).  We  then  have  the  asymptotic  result  that  is  the 
basis  of  our  ANOVA-type  tests. 

Under  reasonable  assumptions  (very  similar  to  those  required  in  the  asymptotic  sampling 
distribution  theory  discussed  in  previous  sections-see  [6,  8,  31])  involving  regularity  and  the 
manner  in  which  samples  are  taken,  one  can  prove  [6]: 

(a)  We  have  the  estimator  convergence  9n  — >  9q  as  n  — >  oo  with  probability  one; 

(b)  If  Hq  is  true,  Un  — >  U{r)  as  n  oo  where  U  ~  X2(r);  a  X2  distribution  with  r 
degrees  of  freedom. 


An  example  of  the  y2  density  is  depicted  in  Figure  21  where  the  density  for  y2(4)  (y2 
with  r  —  4  degrees  of  freedom)  is  graphed. 


Figure  21:  Example  of  U  ~  y2(4)  density 


In  this  figure  two  parameters  (r,  a)  of  interest  are  shown.  For  a  given  value  r,  the  value 
ol  is  simply  the  probability  that  the  random  variable  U  will  take  on  a  value  greater  than  a. 
That  is,  Prob{U  >  r}  =  a  where  in  hypothesis  testing,  a  is  the  significance  level  and  r  is 
the  threshold. 


46 


We  wish  to  use  this  distribution  to  test  the  null  hypothesis,  H0l  where  we  approximate 
by  Un  ~  y2(r).  If  the  test  statistic,  Un  >  r,  then  we  reject  H0  as  false  with  confidence 
level  (1  —  a)100%.  Otherwise,  we  do  not  reject  H0  as  true.  For  cat  brain  problem,  we  use  a 
y2(l)  table,  which  can  be  found  in  any  elementary  statistics  text  or  online  and  is  given  here 
for  illustrative  purposes. 


Table  8:  y2(l) 


a 

T 

confidence 

.25 

1.32 

75% 

.1 

2.71 

90% 

.05 

3.84 

95% 

.01 

6.63 

99% 

.001 

10.83 

99.9% 

7.1.1  p- values 

The  minimum  value  a*  of  a  at  which  H0  can  be  rejected  is  called  the  p-value.  Thus,  the 
smaller  the  p-value,  the  stronger  the  evidence  in  the  data  in  support  of  rejecting  the  null 
hypothesis  and  including  the  term  in  the  model,  i.e.,  the  more  likely  the  term  should  be  in 
the  model.  We  implement  this  as  follows:  Once  we  compute  Un  =  r ,  then  p  =  a*  is  the  value 
that  corresponds  to  f  on  a  y2  graph  and  so,  we  reject  the  null  hypothesis  at  any  confidence 
level,  c,  such  that  c  <  1  —  a*.  For  example,  if  for  a  computed  t  we  find  p  =  a*  =  .0182, 
then  we  would  reject  H0  at  confidence  level  (1  —  a*)100%  =  98.18%  or  lower.  For  more 
information,  the  reader  can  consult  ANOVA  discussions  in  any  good  statistics  book. 

7.1.2  Alternative  statement 

To  test  the  null  hypothesis  H0,  we  choose  a  significance  level  a  and  use  y2  tables  to  obtain 
the  corresponding  threshold  r  =  r(a)  so  that  P(y2(r)  >  r)  =  a.  We  next  compute  Un  =  r 
and  compare  it  to  r.  If  Un  >  r,  then  we  reject  H0  as  false;  otherwise,  we  do  not  reject  the 
null  hypothesis  H0. 

7.2  Revisiting  the  cat-brain  problem 

We  summarize  use  of  the  above  model  comparison  techniques  outlined  above  by  returning 
to  the  cat  brain  example  discussed  in  detail  in  [7,  8].  There  were  3  sets  of  experimental  data 
examined,  under  the  null-hypothesis  H0  :  V  =  0. 

For  the  Data  Set  1,  we  found  after  carrying  out  the  inverse  problems  over  0  and  Qh, 
respectively, 

Jn(9n )  =  106.15  and  Jn{6nH)  =  180.1, 
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which  gives  us  that  Un  =  5.579  (noting  that  n  =  8  /  oo),  for  which  p  =  a*  =  .0182.  Thus, 
we  reject  H0  in  this  case  at  any  confidence  level  less  than  98.18%.  Thus,  we  should  reject 
that  V  =  0,  which  suggests  convection  is  important  in  describing  this  data  set. 

For  Data  Set  2,  we  found 

Jn(9n)  =  14.68  and  Jn{6nH)  =  15.35, 

and  thus,  in  this  case,  we  have  Un  =  .365,  which  implies  we  do  not  reject  H0  with  high  degrees 
of  confidence  (p- value  very  high).  This  suggests  V  =  0,  which  is  completely  opposite  to  the 
findings  for  Data  Set  1. 

For  the  final  set  ( Data  Set  3)  we  found 

Jn(6n )  =  7.8  and  Jn{0nH)  =  146.71, 

which  yields  in  this  case,  Un  =  15.28.  This,  as  in  the  case  of  the  first  data  set,  suggests  (with 
p  <  .001)  that  V  %  0  is  important  in  modeling  the  data. 

The  difference  in  conclusions  between  the  first  and  last  sets  and  that  of  the  second  set 
is  interesting  and  perhaps  at  first  puzzling.  However,  when  discussed  with  the  doctors  who 
provided  the  data,  it  was  discovered  that  the  first  and  last  set  were  taken  from  the  white 
matter  of  the  brain,  while  the  other  was  taken  from  the  grey  matter.  This  later  hireling  was 
consistent  with  observed  microscopic  tests  on  the  various  matter  (micro  channels  in  white 
matter  that  promote  convective  “how”).  Thus,  it  can  be  suggested  with  a  reasonably  high 
degree  of  confidence,  that  white  matter  exhibits  convective  transport,  while  grey  matter  does 
not. 
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8  Epi  Model  Comparison 

We  return  to  the  previously  introduced  epidemiological  model  as  another  example  of  a  way  in 
which  the  model  comparison  statistic  may  be  used.  Here  we  apply  this  statistic  to  determine 
whether  a  more  sophisticated  model  is  appropriate  based  on  the  surveillance  data  from  the 
Australian  NNDS  website.  Here  we  introduce  the  modified  model  and  describe  the  test 
statistic  for  this  example.  We  then  present  the  results  from  the  least  squares  estimation 
procedure  in  both  the  cases  of  the  simplified  and  more  complex  model,  and  finally,  interpret 
the  conclusions  indicated  by  the  test  statistic. 

So  far,  in  our  model  of  invasive  pneumococcal  disease  dynamics,  we  have  considered  the 
progression  of  individuals  from  a  colonized  to  an  infected  state  by  a  constant  linear  per  capita 
rate.  However,  this  is  a  gross  simplification  of  more  complex  physiological  processes,  many 
of  which  occur  within  the  individual  and  would  likely  require  more  sophisticated  mathemat¬ 
ical  representations.  But  it  is  also  possible  that  at  a  population  level,  this  linear  term  may 
sufficiently  capture  the  dynamics  of  the  infections  when  the  model  solutions  are  compared 
with  observed  data.  One  specific  mechanism  that  we  can  explicitly  consider  is  ‘exogenous 
reinfection’,  that  is,  the  establishment  of  an  infection  within  a  colonized  individual  through 
repeated  exposure  to  S.  pneumoniae  via  contacts  with  other  individuals  harboring  the  bac¬ 
teria.  The  inclusion  of  this  mechanism  results  in  the  following  modified  model  equations 


dS_ 

dt 

dE 

dt 

dSv 

dt 

dEy 

dt 


A  -  (31SE  +  Ev  +  I  +  Iv  +aE  +  'yI-(j)S  +  pSv-  pS  (77) 

(31SE  +  Ev  +  I  +  Iv  -aE-  ln(t)E  -  <j>E  +  pEv  -  pE  -  l(52EE  +  Ey  +  1  +  — 

(78) 

(f)S  —  ef3\Sy — — V  +  aEy  +  7  Iv  —  pSy  —  pSy  (79) 

e/3iSy - — - l—j^r - V  —  aEy  +  <j)E  —  pEy  —  Shi(t)Ey 


—  pEy  —  8(5 2 


E  +  Ey  +  I  +  Iy 
N 


dl 

dt 

dly 

dt 


ln(t)E  +  l(52E 
8n(t)Ey  +  8(52 


E  +  Ey  +  /  +  Iy 
N 

E  +  Ey  +  /  +  Iy 
N 


(7  +  V  +  n)I 

(7  +  If  +  v)Iv- 


(80) 

(81) 

(82) 


8.1  Surveillance  Data 


Our  interpretation  of  the  case  notification  data  must  also  be  modified  to  reflect  the  additional 
infection  mechanism,  so  that  the  number  of  new  cases  is  now 
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Yj  ~  }{tj,  0)  =  I  J+1  [IkE  +  l(32EE  +  Ev  +  1  +  —  +  SkEv 

Jtj  A 

i  xn  1 7  E  +  +  I  +  j 

+  d/32£v - ^ - Jas, 


where  j  =  1,...,36.  We  estimate  parameters  0  =  (/3i,  k0,  aci,  5,  /32)T  now  from  these  36 
monthly  cases,  and  from  the  corresponding  annual  reports  of  which  of  these  cases  were 
vaccinated  or  unvaccinated.  These  data  are  represented  by 


and 


Yi~f{ti,e)  = 


H+l 


IkE  +  l(32E- 


v 


N 


ds , 


Yi  ~  fit,,  9)  — 


cti  + 1 


SkEy  +  8/32Ey- 


E  +  Ev  +  I  +  I- 


V 


N 


ds, 


for  i  —  1,2, 3,  and  tt  =  1, 13, 25,  37  months  for  %  —  1, . . . ,  4.  Again,  we  assume  the  statistical 
model  Yj  =  f(tj,90 )  +  where  =  0,  var(ej)  =  cr^  for  all  j  =  1,...,36,  and  Yt  = 

f(ti,90 )  +  et  where  E[ei\  =  0,  uar(ej)  =  erf  for  all  i  =  1,2,3.  Thus,  we  have  assumed  that 
the  variance  is  constant  longitudinally,  but  not  equivalent  across  types  of  observations.  That 
is,  it  is  likely  that  there  is  more  variation  in  the  annually  reported  observations  than  those 
reported  on  a  monthly  basis.  The  least  squares  estimation  procedure  is  described  in  more 
detail  in  [32], 


8.2  Test  Statistic 

Here  we  describe  the  application  of  a  test  statistic  to  this  example  to  compare  the  modified 
model  to  the  comparably  simpler  model.  The  statistic  will  provide  a  basis  from  which  to 
decide  whether  the  observed  data  warrants  the  additional  complexity  incorporated  in  the 
above  model. 

From  the  n  =  42  observations  Yj  approximated  by  the  model  quantities  f(tj ,  9),  we  seek 
to  estimate  parameters  9  —  (fJi,  k0,  K\,  5,(32)t.  We  obtain  these  estimates  via  a  least  squares 
estimation  process  in  which  our  estimate  for  9  minimizes  an  objective  functional  Jn{9).  When 
f(tj,9 )  is  that  for  the  more  sophisticated  model  above, 

9  =  argmin  Jn{9 ), 

7e© 

where  0  C  is  a  (compact)  feasible  parameter  set.  The  constraint  operator  77  :  R5  — »  M1 
of  (75)  for  our  example  is  then  the  1x5  vector  H  =  (0,  0,  0,  0, 1)  and  c  =  0. 

Thus  the  reduced  parameter  space  (in  the  case  of  the  reduced  model,  where  j32  =  0),  is 
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Tabic  9:  Parameter  estimates  without  and  with  ‘exogenous  reinfection’. 


0 

Oh 

((32  =  0) 

SE(0h ) 

0 

(P2  0) 

SE(0) 

P 

1.52175 

0.02 

1.52287 

0.0029 

K  0 

1.3656e-3 

1.3e-4 

1.3604e-3 

0.0012 

ft  1 

0.56444 

0.04 

0.5632 

0.52 

S 

0.7197 

0.06 

0.71125 

0.38 

(32 

N/A 

N/A 

2.2209e-14 

0.01 

0^  =  {0  g  0  :  HO  =  0}  =  {0  G  0  :  (32  =  0}. 

The  estimate  for  0  over  Oh  is  denoted  by  6h,  and  is  found  by  minimizing  the  same 
objective  functional  over  the  smaller  parameter  space  Oh,  he., 


0H  =  arg  min  Jn(0). 

0£&H 


We  use  the  test  statistic 


TT  Jn(0H )  ~  Jn{6) 

Jn(0) 

which,  under  reasonable  conditions,  converges  to  a  y2(l)  distribution.  For  a  significance 
level  a,  we  would  find  a  threshold  r  such  that  Pr{y2(l)  >  r}  =  a.  Then  if  Un  >  r,  we 
reject  our  null  hypothesis  as  false,  otherwise  we  do  not  reject.  Our  null  hypothesis  in  this 
case  is  H0  :  K0o  =  0,  or  that  the  true  (32  =  0. 

8.3  Inverse  Problem  Results 

In  this  section  we  compare  the  results  of  the  least  squares  estimation  procedure  with  and 
without  the  ‘exogenous  reinfection’  term.  The  same  set  of  surveillance  data,  described  in 
Section  8.1,  is  used  in  both  cases.  The  parameter  estimates  and  corresponding  standard 
errors  are  shown  in  Table  9. 

The  parameter  estimates  themselves,  Oh  and  0 ,  do  not  differ  significantly.  Although, 
the  standard  errors  indicate  that  our  ability  to  estimate  (3  and  n(t)  does  change  drastically 
depending  on  whether  or  not  the  two  mechanisms  of  infection  are  considered.  When  the 
reinfection  term  is  considered,  we  see  that  the  standard  error  for  this  particular  parameter 
indicates  that  our  data  do  not  provide  a  significant  amount  of  information  on  this  process. 
However,  the  smaller  residual,  RSS ,  when  the  objective  functional  is  minimized  over  a 
larger  parameter  space  (when  /32  0 ) ,  might  indicate  that  including  the  extra  term  provides 
a  better  fit.  To  resolve  these  two  seemingly  contrasting  pieces  of  information,  we  turn  to  the 
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Figure  22:  Best  fit  model  solutions  to  monthly  case  notifications  with  constant  variance 


error. 


Best  Fit  Model  Solution,  p2  =  0 
RSS  =  4,244.5 


'l _ . _ . _ . _ l  — l _ . _ • _ . _ l 

1  1.5  2  2.5  3  1  1.5  2  2.5  3 

t  (years)  t  (years) 


Best  Fit  Model  Solution,  p2  *  0 
RSS  =  4,220.8 


test  statistic  to  determine  if  the  difference  in  residuals  is  enough  to  justify  the  inclusion  of 
this  extra  infection  rate. 


8.4  Model  Comparison 

The  test  statistic  can  be  calculated  as 


„  Jn(0H)-Ue)  4,244.5-4,220.8  „  „„„ 

Ur)  -  Ti  ^  -  42  X  _ _ _  _  (J.ZOU. 


Jn{0) 


4,220.8 


Note  that  the  residual  sum  of  squares  is  the  value  of  the  objective  function,  so  that  RSS  = 
Jn{9).  We  compare  this  to  a  y2(l)  table  (see  Table  8)  and  see  that  even  at  a  significance 
level  of  only  75%  we  cannot  reject  our  null  hypothesis.  That  is,  the  difference  in  residuals, 
and  hence  the  improvement  of  the  model  fits  to  this  data  (with  n  =  42),  is  not  sufficient  to 
warrant  including  the  additional  infection  mechanism.  This  does  not  mean  that  reinfection 
does  not  occur,  but  it  does  suggest  that  to  accurately  capture  the  dynamics  of  the  population, 
as  evidenced  by  this  surveillance  data,  it  is  reasonable  to  neglect  this  term.  Therefore,  we 
conclude  that  “reinfection”  is  not  sufficiently  present  in  this  data  to  argue  for  inclusion  of 
this  term  in  population  level  models  of  the  infection  dynamics. 
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9  Concluding  Remarks 

As  might  be  expected,  mathematical  and  statistical  models  cannot  fully  represent  reality 
in  most  scientific  situations.  The  best  that  one  can  hope  is  that  models  can  approximate 
reality  as  presented  by  data  from  experiments  sufficiently  well  to  be  useful  in  promoting  basic 
understanding  as  well  as  prediction.  We  have  in  this  presentation  outlined  some  techniques 
for  evaluation  of  assumptions  regarding  statistical  models  as  well  as  comparison  techniques 
for  mathematical  models  under  the  assumption  that  the  statistical  model  assumed  is  correct. 
The  RSS  based  techniques  discussed  represent  just  one  (which  happens  to  enjoy  a  rigorous 
theoretical  foundation!)  of  many  model  comparison/selection  techniques  available  in  a  large 
literature.  For  example,  among  a  wide  class  of  so-called  “model  selection1'  methods  (some  of 
which  are  heuristic  in  nature)  are  those  based  on  Kullbeck-Leibler  information  loss.  Among 
the  best  known  of  these  is  the  Akaike’s  Information  Criterion  (AIC)  selection  procedure  and 
its  numerous  variations  (AICc,TIC,  etc.)  [13,  14,  15,  16,  17,  28]  as  well  as  Bayesian  model 
selection  (e.g.,  BIC)  procedures.  While  these  are  important  modeling  tools,  space  limitations 
prohibit  their  discussion  here. 

Finally,  we  have  also  limited  our  discussions  to  estimation  problems  based  on  OLS  and 
GLS  with  appropriate  corresponding  data  noise  assumptions  of  constant  variance  and  non¬ 
constant  variance  (relative  error),  respectively.  There  are  many  other  important  approaches 
(e.g.,  regularization,  asymptotic  embedding,  perturbation,  equation  error,  adaptive  filtering 
and  identification,  and  numerous  Bayesian  based  techniques-see  [8,  12,  20,  23,  27,  33,  35] 
and  the  references  therein)  which  again  we  ignore  because  of  space  limitations. 
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